Geographic Information Systems Asked by OHTO on March 18, 2021
I’ve been trying to create the entropy data below after doing a spatial join of the LSOA codes to the pandas dataframe (starting with GeoPandas). My silly "for loop" way is super slow for 30000 LSOAs in England+Wales.
Maybe a geospatial (python or QGIS based etc) operation would be more appropriate rather than trying to find a groupby solution?
My dataset looks like this and Im trying to find the entropy per ‘lsoa11cd’ area (UK specific geospatial value representing an area/polygon).
The k_means_5 data indicate k = 5.
test01[['k_means_5','lsoa11cd']].head(10)
k_means_5 lsoa11cd
0 1 E01019240
1 1 E01019240
2 1 E01019238
3 1 E01019240
4 1 E01019240
5 1 E01019240
6 1 E01019316
7 1 E01019316
8 1 E01019316
9 1 E01019316
I can get the entropy with this super silly/lame (incorrect?) line but I would like to do it more efficiently as it will take 10 days to iterate with a for loop over the ‘lsoa11cd’ values.
len(test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'])
51
test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'].value_counts()
1 40
2 6
0 5
Name: k_means_5, dtype: int64
test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'].value_counts()/len(test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'])
1 0.784314
2 0.117647
0 0.098039
Name: k_means_5, dtype: float64
from scipy.stats import entropy
test01.loc[test01['lsoa11cd'] == 'E01019238', 'entropy_k_means_5'] = entropy(test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'].value_counts()/len(test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5']), base=5)
I have tried a bit but of course the last step cant work with the shape of the objects I’m sending. Any expert advice?
g1 = test01.groupby('lsoa11cd')['k_means_5'].transform('count')
0 172
1 172
2 51
3 172
4 172
...
1758295 70
1758296 59
1758297 87
1758298 87
1758299 122
Name: k_means_5, Length: 1758300, dtype: int64
g2 = test01.groupby('lsoa11cd')['k_means_5'].value_counts()
lsoa11cd k_means_5
E01000001 4 17
3 9
E01000002 3 24
4 22
E01000003 4 13
..
W01001956 0 42
4 3
W01001957 3 23
4 9
W01001958 4 9
Name: k_means_5, Length: 64908, dtype: int64
entropy( g2/g1, base=5) # NOP!
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP