Data Science Asked by Chris Tennant on December 29, 2020
After applying PCA to reduce the number of features, I am using a DecisionTreeClassifier for a ML problem
Additionally I want to compute the feature_importances_. However, with each iteration of the DecisionTreeClassifier, the feature_importances_ change.
Iteration #1
Iteration #2
Why would it change? I thought the initial split was made on a feature to “produce the purest subsets (weighted by their size)”. Acting on the same training set, why would that change?
Thanks in advance for any help.
From sklearn.tree.DecisionTreeClassifier
help:
The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.
Also, you might want to have a look at my critique on feature importance.
Correct answer by Martin Thoma on December 29, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP