geopandas - How to spatially join based on maximally overlapping feature

Question

I have a gdf of climate zones in CA and one of counties. Multiple climate zones often overlap a county, but I want to assign a county to be the climate zone that takes up the most area in a county.

Below is my code thus far, but it oddly says only climate zone 16 overlaps any counties, so there is a big problem.

import geopandas as gpd

counties = gpd.read_file('tl_2017_us_county/tl_2017_us_county.shp')
counties = counties[counties['STATEFP'] == '06']

climate_zones = gpd.read_file('BuildingClimateZonesMap/CA_Building_Standards_Climate_Zones.shp')
counties.crs = climate_zones.crs
test = gpd.sjoin(climate_zones, counties, how='left', op='intersects')

Here are the links to the data:
http://www.energy.ca.gov/maps/renewable/building_climate_zones.html
https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2017&layergroup=Counties+%28and+equivalent%29

Jon · Accepted Answer

Ignore my comment. After looking through the "test" dataframe and your code, I think I see what's happening.

When you run the line

counties.crs = climate_zones.crs

you are simply telling Geopandas to interpret the counties CRS as the climate_zones CRS, but you are not actually reprojecting it, which you need to do before intersecting. Change that line to this:

counties = counties.to_crs(climate_zones.crs)

and you'll see that "test" contains what you expect.

Note that when I ran the intersection, I got the message

RuntimeWarning: invalid value encountered in ? (vectorized)  outputs = ufunc(*inputs)

I'm not sure if that's a critical problem or not, but you should make sure the test geodataframe is correct (pull it up in QGIS). I looked at it in QGIS and it seemed fine--all the Climate Zones were there and based on the few spot-checks I did, their intersecting counties were correct.

bubalis · Answer

This function should work for this problem:
def max_area_attr_join(gdf1, gdf2, id_col, attribute_col):
    #spatial overlay
    s = gpd.overlay(gdf1, gdf2, how = 'intersection') 
    s['area_ov'] = s.geometry.area
    
    gb = s.groupby(id_col)[['area_ov']].max()
    
    #add the column 'area_ov' to the original gdf.
    gdf1 = gdf1.merge(gb, left_on = id_col, right_index = True) 
    
    #drop extraneous columns from the joined gdf
    s = s[['area_ov', attribute_col]]
    
    #merge the attribute column to the original gdf.
    gdf1 = gdf1.merge(s, on = 'area_ov')

return gdf1.drop(columns = ['area_ov'])

geopandas - How to spatially join based on maximally overlapping feature

2 Answers

Add your own answers!

Ask a Question