Feature importance of random forests

Question

I have a dataset with 11 features, I noticed that manipulating these features (eg dropping one or some of them) doesn’t affect the error scores of training and testing data, so I had to check the importance of these features. Here’s the following:

As noticed the first feature has a very high contirbution. However, the rest have insignificant importance. Thus I tried to run the model using only the first feature. It was expected that the results scores will not decrease significantly as the rest 10 dropped features have very low feature importance. However, after running the experiment with only the first feature, the abs error percentage of the testing data increased significantly from 14.13010% to 22.96036%. why is this happeneing? I expected that the error will be almost near to the base testing results as I train using the feature which dominates the feature importance?
Also, some of these features are correlated (no more than .62 correlation), is this the reason why the scores can't be so reliable? if so, what mertic can I use to test the feature importance for correlated features

martin · Answer

I can´t give you an perfect answer because there is no code, dataset and the target what you want to achieve.
Because the feature importances from random forest, is calculated based on the training data given to the model, not on predictions on a test dataset. That means, that is not the true prediction power. You should check, if there are difference on training and test results, when you run a random forest model. Another oportunity is the permutation feature importance.
But I made a similar experience and solved this on another way.

check correlation (shows linear dependency)
check with the pps (shows polynomial dependency)
https://datascienceplus.com/correlation-vs-pps-in-python/
check feature and permutation importance

With these 4 options, I got a better view into my dataset.
Hope I could help you a bit.

Feature importance of random forests

One Answer

Add your own answers!

Ask a Question