TransWikia.com

Is feature importance from classification a good way to select features for clustering?

Data Science Asked by ricecooker on April 9, 2021

I have a large data set with many features (70). By doing preprocessing (removing features with too many missing values and those that are not correlated with the binary target variable) I have arrived at 15 features. I am now using a decision tree to perform classification with respect to these 15 features and the binary target variable so I can obtain feature importance. Then, I would choose features with high importance to use as an input for my clustering algorithm. Does using feature importance in this context make any sense?

One Answer

It might make sense, but it depends what you're trying to do:

  • If the goal is to predict the binary target for any instance, a classifier will perform much better.
  • If the goal is to group instances by their similarity, loosely taking the binary target into account indirectly, then clustering in this way makes sense. This would correspond to a more exploratory task where the goal is to discover patterns in the data, focusing on the features which are good indicators of the target (it depends how good they actually are).

Correct answer by Erwan on April 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP