Geographic Information Systems Asked by Shffl on May 10, 2021
I have two GeoSeries, consisting of points and polygons. I want to find the polygon in dataframe B that is closest to each point in dataframe A. The polygons are rooftops from https://github.com/Microsoft/USBuildingFootprints, which I already geocoded using https://github.com/Bonsanto/polygon-geohasher.
I’m currently computing the 7 digit geohash of each point, and merging on buildings in neighboring 7 digit geohashes using geotools.expand
. This is better than doing a full outer merge, but relies on explode
. My general approach was to minimize calls of distance
, since computing the distance from a point to a polygon is expensive.
The code is a bit slow (~20 minutes to match 100k rows), and so I’m trying to make it faster. My searching points to r-trees, but the sklearn implementation seems to be geared towards identifying the nearest point, rather than the nearest polygon. I’m interested in the left join rather than the right join.
Code below:
import pandas as pd
import numpy as np
import geopandas
import geohash
from shapely.geometry import Point
def match_func(df):
point = Point(df.iloc[0,:][['lat', 'long']])
df.loc[:, 'dist'] = geopandas.GeoSeries(df.geometry).distance(point)
df = df.sort_values('dist')
return(df.head(1))
def main(file):
x = import_points()
rooftop_df = import_rooftops()
x['id'] = range(1, len(x) + 1)
def neighbor_fun(lat,long):
return(geohash.encode(lat,long,precision=7))
func1 = np.vectorize(neighbor_fun)
x['g7_neighbor'] = func1(x['lat'], x['long'])
x = x.explode('g7_neighbor')
x = x.merge(rooftop_df, left_on='g7_neighbor', right_on='geo7')
xg = x.groupby('id')
xout = pd.concat([match_fun2(df_group) for group_name, df_group in xg])
return(xout)
Following Stefan's suggestion (Find nearest polygon (from GeoSeries) to point (from GeoSeries)), I wrote up the following using STRtree
from Shapely.
Profiling indicates that, even ignoring the overhead associated with creating the tree, each nearest
call is multiple seconds, so that it's slower than my old method. I suspect this is because the tree contains every rooftop, but have to think more about how to subset the points and rooftops data intelligently to get smaller trees.
import pandas as pd
import numpy as np
import geopandas
import geohash
from shapely.geometry import Point
from shapely.strtree import STRtree
def main(file):
x = import_points()
rooftop_df = import_rooftops()
x['row'] = range(1, len(x) + 1)
tree = STRtree(rooftop_df.geometry)
row_ dict = pd.Series(rooftop_df.row.values,index=rooftop_df.geometry.apply(id)).to_dict()
def func1(x, y):
return(accpt_dict[id(tree.nearest(Point(x,y)))])
func1_v = np.vectorize(func1)
x['accpt_id'] = func1_v(x.lat, x.long)
return(x)
Answered by Shffl on May 10, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP