Data Science Asked by Minka on May 1, 2021
I have a dataset consisting of addresses (points) that have several attributes; one that distinguishes the “sort” of address and one anntribute that contains a numerical value.
I want to cluster these points based on:
1. their distance to each other
2. the sort of address
However, the summed numerical attribute per cluster cannot exceed a certain
threshold value.
In other words, the systeem needs to form clusters, but needs to stop clustering as soon as the sum of the numerical value attached to each address has been reached.
How do I even go about it? I have R, Python and other geo- applications at my disposal.
It seems that none of the existing clustering algorythms work. For k- means for example I need to know the number of clusters beforehand, which I don’t.
It seems rather simple, but I can’t find a basic methodology to follow.
Based on your comments, you are looking for agglomerative hierarchical clustering.
You start with one point as its own cluster. Then iterate over pairs of clusters, merging them according to some criterion.
Typically you need to select a "cut point" after which you stop combining clusters. This is not an easy problem in general, and for the most part involves eyeballing your data until it "looks right", much like choosing K in K-means. In your case, however, you can use the external criterion you have in mind. You will need to recompute its value at every step, and then simply stop when its value passes the desired threshold.
Answered by shadowtalker on May 1, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP