feature importance after classification

Question

I have time series data and more or less 200 features for each sample, I used a recurrent neural network for the binary classification task.
After the classification I would like to know which features contribute most to one of the target(let's say target=1). Any suggested method? Thank you

10xAI · Accepted Answer

You may use Permutation importance
 - Get your base-line score
 - Permutate a feature values. May replace with Random values
- Calculate the score again
- The dip is the feature importance for that Feature
- Repeat for all the Features

....Breiman and Cutler also described permutation importance, which measures the importance of a feature as follows. Record a baseline accuracy (classifier) or R2 score (regressor) by passing a validation set or the out-of-bag (OOB) samples through the Random Forest. Permute the column values of a single predictor feature and then pass all test samples back through the Random Forest and recompute the accuracy or R

To check the importance for the individual Class i.e. 0/1
Extrapolate the same to check if the increase is more for False-Positive or False-Negative.
Read Beware Default Random Forest Importances for more explanation.
Few other quotes from the page-

Any machine learning model can use the strategy of permuting columns to compute feature importances. This fact is under-appreciated in academia and industry.

The permutation mechanism is much more computationally expensive than the mean decrease in impurity mechanism, but the results are more reliable. The permutation importance strategy does not require retraining the model after permuting each column; we just have to re-run the perturbed test samples through the already-trained model.

RonsenbergVI · Answer

Another possible solution is to use a L1 regularisation. A Lasso Regression can act as a proxy for a feature selection: since the derivative of the L1 norm is a step function, when training the model the weights associated with a given features will be either close or not from zero depending on their importance to predict the output.
Moreover, sklearn has a method  sklearn.feature_selection.SelectFromModel that allows you to perform feature selection after your model is trained. If you run this method on a Lasso regressor and compare the method result vs. the model weights you will be able to see the correlation between weight magnitude and selected feature.

Graph4Me Consultant · Answer

Non-linear models are very complex so that a single feature importance cannot be derived (in sense if I increase one feature the model will tend to a particular class).
So saying if you increase one feature, the model will vote more for one class is not what you can expect, since the model is non-linear. For example, have a look at google playground and consider the dataset with the two circles.
What you can do though is to derive the feature importance locally, as you can locally approximate the neural network by a linear function.
This can be used to explain the behaviour and the feature importance, but only in small neighborhood around the current position. If you go to another position, the behaviour could be completely different!

feature importance after classification

3 Answers

Add your own answers!

Ask a Question