Data Science Asked by akashdubey on March 1, 2021
I have a clustering task at hand. The data that I have contains only categorical variables. So, k-modes seemed like the best option. But I am not sure what are the data pre processing steps required for the same ?
What I am doing right now is the following:
label encoding features which have ordinal values.
one hot encoding the others.
and that’s all I am doing as part of data preprocessing steps. My feature space gets to 50 from original 4 after doing above mentioned steps. I am getting 17 clusters as best number of clusters for silhouette score 0.60.
Also, I think doing Principal Component Analysis (PCA) to reduce dimensions and feature scaling doesn’t make sense here as if I do this, I might as well use K-means. Would it be a good decision to run PCA and then use k-means for categorical variables ?
No - PCA and k-means can not be used on categorical variables. Both PCA and k-means require numerical variables.
Answered by Brian Spiering on March 1, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP