Data Science Asked by Overflow2341313 on February 12, 2021
What’s the current methodology for clustering geospatial data by features?
Example: I have some demographic dataset. Let’s say this contains average home price and population density.
So, an example correlation here would be home price vs population density. But, the trick is how the clustering gets pulled. For example, an affluent area with high population density isn’t the same as one with low population density. Applying a basic distance metric wouldn’t take this into account since low vs highs could offset each other giving similar distances. This leads me to possibly some form of weighted clustering to pull centroids.
Not sure what methodology takes this into account.
I assume you are trying to find a suitable distance metric based on features of different areas (although spatial distances might also easily be plugged in). In that case, I would first try to make sure the different features are correctly scaled, for example, to zero mean and unit variance.
If the result does not seem right, I would also try looking at different distance metrics. A simple alternative example is the L1 norm:
L1(a, b) = sum_x |x_a - x_b|
Answered by Jan Šimbera on February 12, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP