Geographic Information Systems Asked by rolo on July 22, 2021
I’m a newbie in implementing GIS on Python, but the lack of any GIS software at the office has forced me to learn about it.
I have a huge grid (a multipolygon
) that I’m currently filling out with counts of points using spatial index. When using a subset of the grid the code works just fine but when I use anything bigger than that then it takes forever.
This is the code that I’m currently using
# build the r-tree index
sindex = points.sindex
# find the points that intersect with each subpolygon and add them to points_within_geometry
points_within_grid = pd.DataFrame()
# grid-level dataseet
grid_data = pd.DataFrame()
i=1
for poly in grid:
clear_output(wait=True)
# buffer by the <1 micron dist to account for any space lost in the quadrat cutting
# otherwise may miss point(s) that lay directly on quadrat line
poly = poly.buffer(1e-14).buffer(0)
# find approximate matches with r-tree, then precise matches from those approximate ones
possible_matches_index = list(sindex.intersection(poly.bounds))
possible_matches = points.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(poly)]
# count number of firms within polygon
x=pd.DataFrame({'grid_index': i,
'geometry':[Polygon(poly.envelope)],
'n_firms': len(precise_matches)})
for poi in range(1,9):
x['sic_'+str(poi)]=len(precise_matches[precise_matches.sic1==poi])
grid_data=grid_data.append(x)
# points within the grid
points_within_grid = points_within_grid.append(precise_matches)
print("Current Progress: ", np.round(i/len(grid)*100,0),"%")
i=i+1
# drop duplicate points, if buffered poly caused an overlap on point(s) that lay directly on a quadrat line
points_within_grid = points_within_grid.drop_duplicates(subset=['x', 'y','poi_id'])
points_outside_grid = points[~points.poi_id.isin(points_within_grid.poi_id)]
So, I’ve been thinking about using multiprocessing.Pool
to make it faster. I’ve been reading online but don’t really know how to do it. The first step would be to define a function that can be introduced to the Pool
.
So far I have this function,
def grid_inter(poly)
# buffer by the <1 micron dist to account for any space lost in the quadrat cutting
# otherwise may miss point(s) that lay directly on quadrat line
poly = poly.buffer(1e-14).buffer(0)
# find approximate matches with r-tree, then precise matches from those approximate ones
possible_matches_index = list(sindex.intersection(poly.bounds))
possible_matches = points.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(poly)]
# count number of firms within polygon
x=pd.DataFrame({'grid_index': i,
'geometry':[Polygon(poly.envelope)],
'n_firms': len(precise_matches)})
for poi in range(1,9):
x['sic_'+str(poi)]=len(precise_matches[precise_matches.sic1==poi])
return x
But I’m stuck in the next step which I think should be something like this
pool = mp.Pool(mp.cpu_count())
results = pool.map(grid_inter, [poly for poly in grid])
pool.close()
How can I make ithis work?
This is what I ended up doing. This creates a count of firms with certain characteristics within a grid. It also creates an id based on the centroid of the grid cell that the data is filling in.
def grid_inter(poly):
#initialize data
x=pd.DataFrame({'grid_index_x': poly.centroid.x,
'grid_index_y': poly.centroid.y,
'geometry':[Polygon(poly.envelope)]})
# buffer by the <1 micron dist to account for any space lost in the quadrat cutting
# otherwise may miss point(s) that lay directly on quadrat line
poly = poly.buffer(1e-9).buffer(0)
# find approximate matches with r-tree, then precise matches from those approximate ones
possible_matches_index = list(sindex.intersection(poly.bounds))
possible_matches = points.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(poly)]
# count number of firms within polygon
x['n_firms']=len(precise_matches)
for amenity in points.amenity.unique():
x['poi_'+str(amenity)]=len(precise_matches[precise_matches.amenity==amenity])
for poi in points.naics2.unique():
x['NAICS_'+str(int(poi))]=len(precise_matches[precise_matches.naics2==poi])
return x
start = time.time()
sindex = points.sindex
pool = Pool(cpu_count()-1)
grid_poi_data = pool.map(grid_inter,grid)
pool.close()
pool.join()
grid_poi_data = pd.concat(grid_poi_data)
end = time.time()
print (end - start)
Thanks for the previous comments. They helped a lot.
Answered by rolo on July 22, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP