TransWikia.com

Feature importance of random forests

Data Science Asked on June 30, 2021

I have a dataset with 11 features, I noticed that manipulating these features (eg dropping one or some of them) doesn’t affect the error scores of training and testing data, so I had to check the importance of these features. Here’s the following:
enter image description here

As noticed the first feature has a very high contirbution. However, the rest have insignificant importance. Thus I tried to run the model using only the first feature. It was expected that the results scores will not decrease significantly as the rest 10 dropped features have very low feature importance. However, after running the experiment with only the first feature, the abs error percentage of the testing data increased significantly from 14.13010% to 22.96036%. why is this happeneing? I expected that the error will be almost near to the base testing results as I train using the feature which dominates the feature importance?

Also, some of these features are correlated (no more than .62 correlation), is this the reason why the scores can’t be so reliable? if so, what mertic can I use to test the feature importance for correlated features

One Answer

I can´t give you an perfect answer because there is no code, dataset and the target what you want to achieve.

Because the feature importances from random forest, is calculated based on the training data given to the model, not on predictions on a test dataset. That means, that is not the true prediction power. You should check, if there are difference on training and test results, when you run a random forest model. Another oportunity is the permutation feature importance.

But I made a similar experience and solved this on another way.

With these 4 options, I got a better view into my dataset. Hope I could help you a bit.

Answered by martin on June 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP