Simplifying parsed data

The following functions provide some simple methods to refine data parsed from OSM. For more elaborate geospatial functions, designed to simplify the geometry of shapes and make them more coherent, please refer to bespoke packages.

[2]:
# Loading necessary data packages
import matplotlib.pyplot as plt

import sys
sys.path.append('') # your-path-to/osm-flex/src

import osm_flex
import osm_flex.download as dl
import osm_flex.extract as ex
import osm_flex.config
import osm_flex.simplify as sy

osm_flex.enable_logs()

Example: Remove (near-)duplicates

[3]:
# get an OSM dataset for illustration purposes
iso3 = 'CHE'
path_che_dump = dl.get_country_geofabrik(iso3)
INFO:osm_flex.download:Skip existing file: /Users/evelynm/osm/osm_bpf/switzerland-latest.osm.pbf

Sometimes, places can be tagged as points (POIs) or as their explicit shape (e.g. building polygons, or rooms / areas within larger buildings). This results effectively in near-duplicates. The following examples illustrate this with the pre-written healthcare parser:

[4]:
# Query yields point and multi-polygon data
gdf_che_health = ex.extract_cis(path_che_dump, 'healthcare')
print(f'Number of results: {len(gdf_che_health)}')
INFO:osm_flex.extract:query is finished, lets start the loop
extract points: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1943/1943 [00:14<00:00, 133.47it/s]
INFO:osm_flex.extract:query is finished, lets start the loop
extract multipolygons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1407/1407 [00:52<00:00, 26.60it/s]
Number of results: 3350
[5]:
gdf_che_health = sy.remove_contained_points(gdf_che_health)
print(f'Number of results after removing points contained in polygons: {len(gdf_che_health)}')
Number of results after removing points contained in polygons: 3257
[6]:
gdf_che_health = sy.remove_contained_polys(gdf_che_health)
print(f'Number of results after removing polygons contained in larger polygons: {len(gdf_che_health)}')
Number of results after removing polygons contained in larger polygons: 2455
[7]:
gdf_che_health = sy.remove_exact_duplicates(gdf_che_health)
print(f'Number of results after removing exact geometrical duplicates: {len(gdf_che_health)}')
Number of results after removing exact geometrical duplicates: 2455

Example: Remove small polygons

[10]:
gdf_che_forest = ex.extract(path_che_dump,'multipolygons', ['landuse', 'name'], "landuse='forest'")
print(f'Number of results: {len(gdf_che_forest)}')
INFO:osm_flex.extract:query is finished, lets start the loop
extract multipolygons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 76220/76220 [00:43<00:00, 1762.81it/s]
Number of results: 76220
[11]:
gdf_che_forest = gdf_che_forest.to_crs('epsg:21781') # metre-based CRS for Switzerland
min_area = 100

gdf_che_forest = sy.remove_small_polygons(gdf_che_forest, min_area) # remove all areas < 100m2 (always in units of respective CRS)
print(f'Number of results after removal of small polygons: {len(gdf_che_forest)}')
Number of results after removal of small polygons: 74484
[ ]: