Is there any optimal way on feature selection for more than one classification algorithms?

Question

I have a wine dataset with 13 features that indicates 3 different wine classes (target), and k-NN, SVM with linear kernel and SVM with rbf kernel algorithms to be tried with this dataset.

My goal is to obtain the best classification accuracy, and to obtain this accuracy:

Which classification algorithm (kNN, SVM with linear kernel or SVM with rbf kernel) should I choose?
Among all the features, which ones of them (based on backward elimination, maybe according to p-values) should be chosen?

I have thought of using GridSearchCV with 3 estimators for the algorithms above. But in this case, the problem is feature selection part as you guess. Is there any optimal way to achieve both? Thanks!

89f3a1c · Answer

If you want to know what the feature importance of a dataset is, you can obtain it by training a random forest. After training the random forest, you can access the feature importance which is valid for any algorithm.

Note that the feature importance of other similar algorithms, such as some boosting algorithms, is strictly related to those algorithms; such thing is not true for random forest.

Hope this helps!

EDIT1:

The reason I said random forests give a sort of universal feature importance is that a rf is based on a lot of smaller decision trees in which each one uses bootstrap from the train set and a subset of attributes taken randomly. The bootstrap tries to avoid overfitting, whereas the subset of attributes helps determine which ones are the most important. A rf is capable of giving the importance of each feature averaging the oob precision from the trees that use the attribute. When a feature  is a great predictor, the trees that use it have better results that those that don't use it. With hundreds or thousands of trees, a rf can have a very good opinion on the predicting capacity of each attribute.

Note that each tree in a rf is a decision tree which bases the split on Gini impurity/information gain (entropy), or variance in the case of regression. This naturally selects the features that matter most in each case.

Here, an article which explores alternative ways of studying feature importance. Some more things are said about rf, but it's not limited to that method, so it may be useful for anyone reading this post.
https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e

Is there any optimal way on feature selection for more than one classification algorithms?

One Answer

Add your own answers!

Ask a Question