Data Science Asked by Ângelo D on June 4, 2021
I have an already clustered data set (I wanna keep my x and y), where there’s clearly a small group of elements in the middle that don’t follow the expected patterns.
I can select them manually, but I wonder if there’s a way of automating the selection part of these elements, efficiently.
Something like using just the grouping part of a clustering algorithm, I’ve been trying it with a threshold, but it doesn’t produce good results in cases that won’t form a circular cluster.
It would be helpful to know which clustering technique are you using.
You can use
If you are looking something other that a circular cluster and you need clusters within clusters, I would try DBSCAN. It locates regions of high density and separate outliers and it can find clusters within clusters.
If you are using Python you can use DBSCAN with sklearn
from sklearn.cluster import DBSCAN
I hope that helps!
Answered by daco on June 4, 2021
You have it right, that you want your clustering to tell you which points are most anomalous. For k-means clustering it's the points that are farthest from their assigned cluster.
I don't see a reason to expect that the anomalies form a cluster themselves. If that's what you're expecting you may need to compute something else, like, a clustering of the points beyond a threshold?
Also consider a Gaussian mixture clustering, which is just like k-means except treats cluster assignments as soft and probabilistic. The outliers under that model might make more sense.
Answered by Sean Owen on June 4, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP