TransWikia.com

Geospatial way to optimise cluster entropy calculation per LSOA (Polygon)

Geographic Information Systems Asked by OHTO on March 18, 2021

I’ve been trying to create the entropy data below after doing a spatial join of the LSOA codes to the pandas dataframe (starting with GeoPandas). My silly "for loop" way is super slow for 30000 LSOAs in England+Wales.

Maybe a geospatial (python or QGIS based etc) operation would be more appropriate rather than trying to find a groupby solution?

My dataset looks like this and Im trying to find the entropy per ‘lsoa11cd’ area (UK specific geospatial value representing an area/polygon).
The k_means_5 data indicate k = 5.

    test01[['k_means_5','lsoa11cd']].head(10)
    k_means_5   lsoa11cd
0   1   E01019240
1   1   E01019240
2   1   E01019238
3   1   E01019240
4   1   E01019240
5   1   E01019240
6   1   E01019316
7   1   E01019316
8   1   E01019316
9   1   E01019316

I can get the entropy with this super silly/lame (incorrect?) line but I would like to do it more efficiently as it will take 10 days to iterate with a for loop over the ‘lsoa11cd’ values.

len(test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'])
51
test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'].value_counts()
1    40
2     6
0     5
Name: k_means_5, dtype: int64
test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'].value_counts()/len(test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'])
1    0.784314
2    0.117647
0    0.098039
Name: k_means_5, dtype: float64
    
from scipy.stats import entropy
    test01.loc[test01['lsoa11cd'] == 'E01019238', 'entropy_k_means_5'] = entropy(test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5'].value_counts()/len(test01.loc[test01['lsoa11cd'] == 'E01019238']['k_means_5']), base=5)

I have tried a bit but of course the last step cant work with the shape of the objects I’m sending. Any expert advice?

g1 = test01.groupby('lsoa11cd')['k_means_5'].transform('count')
0          172
1          172
2           51
3          172
4          172
          ... 
1758295     70
1758296     59
1758297     87
1758298     87
1758299    122
Name: k_means_5, Length: 1758300, dtype: int64

g2 = test01.groupby('lsoa11cd')['k_means_5'].value_counts()
lsoa11cd   k_means_5
E01000001  4                             17
           3                              9
E01000002  3                             24
           4                             22
E01000003  4                             13
                                         ..
W01001956  0                             42
           4                              3
W01001957  3                             23
           4                              9
W01001958  4                              9
Name: k_means_5, Length: 64908, dtype: int64

entropy( g2/g1, base=5) # NOP!

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP