Data Science Asked by James Flash on January 2, 2021
What is the parameter max_features
in DecisionTreeClassifier
responsible for?
I thought it defines the number of features the tree uses to generate its nodes. But in spite of the different values of this parameter (n = 1 and 2), my tree employs both features that I have. What changes so?
max_features = 2
max_features = 1
You can see x1
and x2
are used in both cases
Max_feature is the number of features to consider each time to make the split decision. Let us say the dimension of your data is 50 and the max_feature is 10, each time you need to find the split, you randomly select 10 features and use them to decide which one of the 10 is the best feature to use. When you go to the next node you will select randomly another 10 and so on.
This mechanism is used to control overfitting. In fact, it is similar to the technique used in random forest, except in random forest we start with sampling also from the data and we generate multiple trees.
So even if you set the number to 10, if you go deep you will end up using all the features, but each time you limit the set to 10.
If you compare the definition of the max feature in the decision tree and random forest, you will see that they are the same.
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Correct answer by Bashar Haddad on January 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP