Simplifying parsed data
The following functions provide some simple methods to refine data parsed from OSM. For more elaborate geospatial functions, designed to simplify the geometry of shapes and make them more coherent, please refer to bespoke packages.
[2]:
# Loading necessary data packages
import matplotlib.pyplot as plt
import sys
sys.path.append('') # your-path-to/osm-flex/src
import osm_flex
import osm_flex.download as dl
import osm_flex.extract as ex
import osm_flex.config
import osm_flex.simplify as sy
osm_flex.enable_logs()
Example: Remove (near-)duplicates
[3]:
# get an OSM dataset for illustration purposes
iso3 = 'CHE'
path_che_dump = dl.get_country_geofabrik(iso3)
INFO:osm_flex.download:Skip existing file: /Users/evelynm/osm/osm_bpf/switzerland-latest.osm.pbf
Sometimes, places can be tagged as points (POIs) or as their explicit shape (e.g. building polygons, or rooms / areas within larger buildings). This results effectively in near-duplicates. The following examples illustrate this with the pre-written healthcare parser:
[4]:
# Query yields point and multi-polygon data
gdf_che_health = ex.extract_cis(path_che_dump, 'healthcare')
print(f'Number of results: {len(gdf_che_health)}')
INFO:osm_flex.extract:query is finished, lets start the loop
extract points: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1943/1943 [00:14<00:00, 133.47it/s]
INFO:osm_flex.extract:query is finished, lets start the loop
extract multipolygons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1407/1407 [00:52<00:00, 26.60it/s]
Number of results: 3350
[5]:
gdf_che_health = sy.remove_contained_points(gdf_che_health)
print(f'Number of results after removing points contained in polygons: {len(gdf_che_health)}')
Number of results after removing points contained in polygons: 3257
[6]:
gdf_che_health = sy.remove_contained_polys(gdf_che_health)
print(f'Number of results after removing polygons contained in larger polygons: {len(gdf_che_health)}')
Number of results after removing polygons contained in larger polygons: 2455
[7]:
gdf_che_health = sy.remove_exact_duplicates(gdf_che_health)
print(f'Number of results after removing exact geometrical duplicates: {len(gdf_che_health)}')
Number of results after removing exact geometrical duplicates: 2455
Example: Remove small polygons
[10]:
gdf_che_forest = ex.extract(path_che_dump,'multipolygons', ['landuse', 'name'], "landuse='forest'")
print(f'Number of results: {len(gdf_che_forest)}')
INFO:osm_flex.extract:query is finished, lets start the loop
extract multipolygons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 76220/76220 [00:43<00:00, 1762.81it/s]
Number of results: 76220
[11]:
gdf_che_forest = gdf_che_forest.to_crs('epsg:21781') # metre-based CRS for Switzerland
min_area = 100
gdf_che_forest = sy.remove_small_polygons(gdf_che_forest, min_area) # remove all areas < 100m2 (always in units of respective CRS)
print(f'Number of results after removal of small polygons: {len(gdf_che_forest)}')
Number of results after removal of small polygons: 74484
[ ]: