Geographic Information Systems Asked by James Hinkley on July 13, 2021
I have created the following function that measures the distance from one point in a GDF to all points in another GDF and returns a table back with the shortest distance for each point. It works well for one point however I neglected the fact I have a table of 4000 points and so it takes 10 mins. I have ran it in PostGIS and can get it down to less than a second. Is there a way to do this in Python that could match the PostGIS speed?
def get_distance_to(gdf_in, aoi_df, aoi):
dist_df_list = list()
for row in range(len(gdf_in)):
single_row = gdf_in.iloc[row]
distances = aoi_df.geometry.distance(single_row.geometry)
dist_list = distances.to_list()
closest_aoi = min(dist_list)
single_row["dist_to_"+aoi] = closest_aoi
df = single_row.to_frame().T
dist_df_list.append(df)
completed_distances = pd.concat(dist_df_list, ignore_index=True, sort=False)
return completed_distances
my input tables looks something like this
and the output table looks like this
Iterate over rows in a (Geo)DataFrame in (Geo)Pandas is very slow, see Optimum approach for iterating over a DataFrame for example
Iteration in Pandas is an anti-pattern and is something you should only do when you have exhausted every other option. (How to iterate over rows in a DataFrame in Pandas)
You can try to use (Geo)DataFrame.apply()
and shapely:nearest point as in GeoPandas: Find nearest point in other dataframe without for
iteration (see comment)
import geopandas as gpd
from shapely.geometry import Point
from shapely.ops import nearest_points
gpd1 = gpd.read_file("point1.shp") # red points
gpd2 = gpd.read_file("point2.shp") # blue points
pts3 = gpd2.geometry.unary_union
def near(point, pts=pts3):
# find the nearest point and return the corresponding value
nearest = gpd2.geometry == nearest_points(point, pts)[1]
return gpd2[nearest].id.values[0],gpd2[nearest].geometry.values[0]
gpd1['Nearest'] = gpd1.apply(lambda row: near(row.geometry)[0], axis=1)
gpd1['geom2'] = gpd1.apply(lambda row: near(row.geometry)[1], axis=1)
print(gpd1)
id geometry Nearest geom2
0 1 POINT (-0.99013 0.48096) 3 POINT (-0.77574 0.64739)
1 2 POINT (-1.00987 0.08039) 4 POINT (-0.73060 0.10860)
2 3 POINT (-0.71932 -0.13117) 5 POINT (-0.57827 -0.08039)
3 4 POINT (-0.90268 -0.28914) 5 POINT (-0.57827 -0.08039)
Compute the distance
gpd1['distance'] = gpd1.apply(lambda row: row.geometry.distance(row.geom2), axis=1)
gpd1.drop('geom2', axis=1, inplace=True)
print(gpd1)
id geometry Nearest distance
0 1 POINT (-0.99013 0.48096) 3 0.271406
1 2 POINT (-1.00987 0.08039) 4 0.280688
2 3 POINT (-0.71932 -0.13117) 5 0.149905
3 4 POINT (-0.90268 -0.28914) 5 0.385759
Answered by gene on July 13, 2021
For any spatial operation of this kind, you should always try to use spatial index. If you are interested in the minimal distance only, the following should give you relatively performant option.
import geopandas as gpd
from shapely.geometry import Point
import pandas as pd
import random
gdf = gpd.GeoDataFrame(geometry=[Point(random.randint(0, 1000), random.randint(0, 1000)) for _ in range(1000)])
gdf2 = gpd.GeoDataFrame(geometry=[Point(random.randint(0, 1000), random.randint(0, 1000)) for _ in range(1000)])
def get_nearest_distance(left, right, initial_buffer):
"""get distance from left to right"""
buffered = left.buffer(initial_buffer)
distances = []
for i in range(len(buffered)):
geom = buffered.geometry.iloc[i]
query = right.sindex.query(geom)
while query.size == 0:
query = right.sindex.query(geom.buffer(b))
b += initial_buffer
distances.append(right.iloc[query].distance(left.geometry.iloc[i]).min())
return pd.Series(distances, index=left.index)
gdf['distance_to_x'] = get_nearest_distance(gdf, gdf2, 50)
For 1000 to 1000 points, it is less than a second, compared to about a minute @gene's code takes.
To make it efficient, you should guess the initial_buffer
distance in which you think will be only a few points. If none is in, then it expands buffer until it hits some.
Generally, if you want the best performance from GeoPandas, you should use the latest version (this code requires 0.8) and optional dependency pygeos (https://geopandas.readthedocs.io/en/latest/getting_started/install.html#using-the-optional-pygeos-dependency), which can speed up the code above by the order of magnitude.
Answered by martinfleis on July 13, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP