TransWikia.com

How to get the probability/closeness of a sample belonging to a specific cluster?

Data Science Asked by Jaskaran Singh Puri on March 7, 2021

I’m new to this so please let me know if my logic of comparing cosine similarity and k-means is incorrect

I got a set of 4 clusters from k-means and now I’m interested in the Cluster No. 1. For this cluster, I take the average of all values for each column and keep it aside.

Now, I have a test sample, for which I run k-means prediction and I get output as 1, meaning it belongs to Cluster No. 1 which is good for me but my use-case here was to calculate that even if that sample didn’t belong to Cluster 1, how close was it to falling in that Cluster No. 1

Hence, to resolve this I thought of doing a cosine similarity between my test sample and the one where I take average of all values for each column. Now, in this case, I get a similarity of just 5%

I’m not sure, for my use-case i.e. (Getting the probability/closeness of a sample belonging to a specific cluster) which is a better interpretation for me?

I know I can use the cluster labels as y variables and make multi-class classification model but I want to keep it as un-supervised as possible. Please guide

One Answer

Hi Try Gaussian Mixture Model ,it similar to kmeans but differs in few ways in nutshell think of kmeans as hard clustering model where 1 sample is assigned to only one cluster whereas GMM is soft clustering technique that tells the density(probability) of the each Gaussian mixtures(consider this as the cluster) to that data point,you can get both the labels as well as probity score from the model .Try it and see if it helps your usecase. Its available in the SKlearn library.

Another approach incase if you do not want GMM would be 1.take the cluster centers of kmeans ,2.take your testsamplw vector and pass these as parameters to softmax function to get the probabilty score of the sample from all the cluster centers

Answered by Aj_MLstater on March 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP