TransWikia.com

How to add 'other' as one group to clustering algorithm inference pipeline

Data Science Asked on August 15, 2021

I have few clustering algorithms tuned having 5 cluster. I want 6th cluster if new data does not belong initial 5 cluster fall in 6th cluster.

6th cluster [ say other category] consist of all data point which does not belong to 5 cluster.

P.S.:- initial whatever data is give is belong those 5 cluster. so say, kmean algorithms with number cluster as 5. during inference I wanted add 6th cluster so any which does belong given cluster can put this category depending on threshold distance. I have textual data. do let me which clustering algorithms i should go with dbscan, som etc..

One Answer

Clustering doesn't work like this: for example k-means assigns an instance to the closest centroid, and since there is always a closest centroid there is a always a cluster that an instance "belongs to".

So you need a different approach if you plan to have the possibility of in instance "not in any group":

  • redo the clustering on the full set of instances
  • apply a first step which detects outliers
  • train a one-class classification model for every cluster

Answered by Erwan on August 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP