TransWikia.com

Why does Feature Importance change with each iteration of a Decision Tree Classifier?

Data Science Asked by Chris Tennant on December 29, 2020

After applying PCA to reduce the number of features, I am using a DecisionTreeClassifier for a ML problem

enter image description here
Additionally I want to compute the feature_importances_. However, with each iteration of the DecisionTreeClassifier, the feature_importances_ change.

Iteration #1

enter image description here

Iteration #2

enter image description here

Why would it change? I thought the initial split was made on a feature to “produce the purest subsets (weighted by their size)”. Acting on the same training set, why would that change?

Thanks in advance for any help.

One Answer

From sklearn.tree.DecisionTreeClassifier help:

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

Also, you might want to have a look at my critique on feature importance.

Correct answer by Martin Thoma on December 29, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP