Data Science Asked by Jaskaran Singh Puri on March 7, 2021
I’m new to this so please let me know if my logic of comparing cosine similarity
and k-means
is incorrect
I got a set of 4 clusters
from k-means
and now I’m interested in the Cluster No. 1
. For this cluster, I take the average of all values for each column
and keep it aside.
Now, I have a test sample, for which I run k-means prediction
and I get output as 1
, meaning it belongs to Cluster No. 1
which is good for me but my use-case here was to calculate that even if that sample didn’t belong to Cluster 1
, how close was it to falling in that Cluster No. 1
Hence, to resolve this I thought of doing a cosine similarity
between my test sample and the one where I take average of all values for each column
. Now, in this case, I get a similarity of just 5%
I’m not sure, for my use-case i.e. (Getting the probability/closeness of a sample belonging to a specific cluster)
which is a better interpretation for me?
I know I can use the cluster labels as y
variables and make multi-class classification model
but I want to keep it as un-supervised
as possible. Please guide
Hi Try Gaussian Mixture Model ,it similar to kmeans but differs in few ways in nutshell think of kmeans as hard clustering model where 1 sample is assigned to only one cluster whereas GMM is soft clustering technique that tells the density(probability) of the each Gaussian mixtures(consider this as the cluster) to that data point,you can get both the labels as well as probity score from the model .Try it and see if it helps your usecase. Its available in the SKlearn library.
Another approach incase if you do not want GMM would be 1.take the cluster centers of kmeans ,2.take your testsamplw vector and pass these as parameters to softmax function to get the probabilty score of the sample from all the cluster centers
Answered by Aj_MLstater on March 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP