TransWikia.com

Feature Importance without Random Forest Feature Importances

Data Science Asked on March 1, 2021

Is their an intuitive way of finding feature importances without just using the random forest feature importances method?

I have a binary logistic regression problem where I have binary features (1 or 0) and a binary target (1 or 0).

I want to see which features are most important towards predicting the target and somehow rank them.

I did an odds ratio for each feature, which gave me some idea of importance.

Are there any other methods?

2 Answers

Global Explanation:

The overall importance of a feature in a decision tree(and also applied to random forest and GBDT) can be computed in the following way:

  • ‘weight’: the number of times a feature is used to split the data across all trees.

  • ‘gain’: the average gain across all splits the feature is used in.

  • ‘cover’: the average coverage across all splits the feature is used in.

  • ‘total_gain’: the total gain across all splits the feature is used in.

  • ‘total_cover’: the total coverage across all splits the feature is used in.

This is extracted from the xgboost API.

Local Explanations If you want to get individual examples of why a prediction was made you can use either

There are plenty of ways to achieve a better model explainability and accountability. I recommend you this book.

Answered by Carlos Mougan on March 1, 2021

There are many ways to try to estimate feature importance. Personally I think the random forest measures get overused simply due to the fact that they have “importance” in their name and many people have heard of them. However, what people don’t realize is that those features that the random forest deems important are important for the random forest. They are good at predicting your feature of interest in a random forest setting. Taking this and blindly applying it to non random forest problems is dangerous. What might be predictive for a random forest may not be for some other algorithm. Also these random forest importance measures are not without their faults, for example they are biased towards variables with wide ranges.

There are many other methods available for variable importance such as information gain and relief. I suggest you read this paper by Robnik-Sikonja

https://link.springer.com/content/pdf/10.1007%2F978-3-540-39857-8_30.pdf

It covers many different methods.

Answered by astel on March 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP