TransWikia.com

PCA and k-means for categorical variables?

Data Science Asked by akashdubey on March 1, 2021

I have a clustering task at hand. The data that I have contains only categorical variables. So, k-modes seemed like the best option. But I am not sure what are the data pre processing steps required for the same ?

What I am doing right now is the following:

  • label encoding features which have ordinal values.

  • one hot encoding the others.

and that’s all I am doing as part of data preprocessing steps. My feature space gets to 50 from original 4 after doing above mentioned steps. I am getting 17 clusters as best number of clusters for silhouette score 0.60.

Also, I think doing Principal Component Analysis (PCA) to reduce dimensions and feature scaling doesn’t make sense here as if I do this, I might as well use K-means. Would it be a good decision to run PCA and then use k-means for categorical variables ?

One Answer

No - PCA and k-means can not be used on categorical variables. Both PCA and k-means require numerical variables.

Answered by Brian Spiering on March 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP