What would be a good way to use clustering for outlier detection?

Question

For simplicity let's assume the feature space is the XY plane.

Charlie Greenbacker · Answer

Perhaps you could cluster the items, then those items with the furthest distance from the midpoint of any cluster would be candidates for outliers.

Javierfdr · Answer

A very robust clustering algorithm against outliers is PFCM from Bezdek.
In this paper Bezdek proposes Possibilistic-Fuzzy-C-Means which is an improvement of the different variations of fuzzy posibilistic clustering. This algorithm is particularly good at detecting outliers and avoiding them to influence the clusterization. So using PFCM you could find which points are identified as outliers and at the same time have a very robust fuzzy clustering of your data.

Has QUIT--Anony-Mousse · Answer

Gaussian mixture modeling can - if your data is nicely gaussian-like - be used for outlier detection. Points with a low density in every cluster are likely to be outliers.

Works well in idealistic scenarios.

TheGrimmScientist · Answer

Apply your clustering algorithm
Calculate distance from all data points to its assigned cluster
Label the data points furthest from a center as an outlier

Randomly generating 100 data points from three gaussians, clustering them with k-means, and marking the 10 'furthest from a center' data points gave the following graph:

see this notebook for the full example

The burden of solving what "distance" means will already have to be solved for you to run a clustering algorithm.  It will still be up to you to pick off what distance means an outlier.  In this example, I just picked the N most distant data point, though you'll probably want to pick any number of data points over a certain number of standard deviations from a center.

preems · Answer

If your Data points are dense and noise points are away from the dense region, you can try DBSCAN algorithm.

Tweak its parameters until u get a best fit.

What would be a good way to use clustering for outlier detection?

5 Answers

Add your own answers!

Ask a Question