Clustering with Only Categorical Features

Question

I am trying to do clustering with a bunch (24) of categorical features. I have done some research and found a lot of people recommending something such as K-Modes. I tried running K-Modes on my data and the best run had a cost of 27069.0, which seems pretty high.

Some of my features have only a few values, such as P, O, C, T, so I thought I could encode them. But others have many different values. Any tips on a clustering algorithm or some other approach? I would like to use Python.

EDIT: What about using Gower distance on the data and then using K-Means on that?

OmG · Answer

You can one-hot encode all your features, first. Then, you will face with a sparse feature space. To resolve this issue, you can use an auto-encoder to encode all these values to a low-dimensional and more dense space. Then run one of your clustering methods such as k-means.

Answered by OmG on June 17, 2021

Clustering with Only Categorical Features

One Answer

Add your own answers!

Ask a Question