Data Science Asked on June 17, 2021
I am trying to do clustering with a bunch (24) of categorical features. I have done some research and found a lot of people recommending something such as K-Modes. I tried running K-Modes on my data and the best run had a cost of 27069.0, which seems pretty high.
Some of my features have only a few values, such as P, O, C, T
, so I thought I could encode them. But others have many different values. Any tips on a clustering algorithm or some other approach? I would like to use Python.
EDIT: What about using Gower distance on the data and then using K-Means on that?
You can one-hot encode all your features, first. Then, you will face with a sparse feature space. To resolve this issue, you can use an auto-encoder to encode all these values to a low-dimensional and more dense space. Then run one of your clustering methods such as k-means.
Answered by OmG on June 17, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP