Geographic Information Systems Asked on March 29, 2021
I have a gdf of climate zones in CA and one of counties. Multiple climate zones often overlap a county, but I want to assign a county to be the climate zone that takes up the most area in a county.
Below is my code thus far, but it oddly says only climate zone 16 overlaps any counties, so there is a big problem.
import geopandas as gpd
counties = gpd.read_file('tl_2017_us_county/tl_2017_us_county.shp')
counties = counties[counties['STATEFP'] == '06']
climate_zones = gpd.read_file('BuildingClimateZonesMap/CA_Building_Standards_Climate_Zones.shp')
counties.crs = climate_zones.crs
test = gpd.sjoin(climate_zones, counties, how='left', op='intersects')
Here are the links to the data:
http://www.energy.ca.gov/maps/renewable/building_climate_zones.html
https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2017&layergroup=Counties+%28and+equivalent%29
Ignore my comment. After looking through the "test" dataframe and your code, I think I see what's happening.
When you run the line
counties.crs = climate_zones.crs
you are simply telling Geopandas to interpret the counties CRS as the climate_zones CRS, but you are not actually reprojecting it, which you need to do before intersecting. Change that line to this:
counties = counties.to_crs(climate_zones.crs)
and you'll see that "test" contains what you expect.
Note that when I ran the intersection, I got the message
RuntimeWarning: invalid value encountered in ? (vectorized) outputs = ufunc(*inputs)
I'm not sure if that's a critical problem or not, but you should make sure the test geodataframe is correct (pull it up in QGIS). I looked at it in QGIS and it seemed fine--all the Climate Zones were there and based on the few spot-checks I did, their intersecting counties were correct.
Correct answer by Jon on March 29, 2021
This function should work for this problem:
def max_area_attr_join(gdf1, gdf2, id_col, attribute_col):
#spatial overlay
s = gpd.overlay(gdf1, gdf2, how = 'intersection')
s['area_ov'] = s.geometry.area
gb = s.groupby(id_col)[['area_ov']].max()
#add the column 'area_ov' to the original gdf.
gdf1 = gdf1.merge(gb, left_on = id_col, right_index = True)
#drop extraneous columns from the joined gdf
s = s[['area_ov', attribute_col]]
#merge the attribute column to the original gdf.
gdf1 = gdf1.merge(s, on = 'area_ov')
return gdf1.drop(columns = ['area_ov'])
Answered by bubalis on March 29, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP