Data Science Asked on November 27, 2020
I just want to know the details of what (and how) is the criteria used by sklearn.tree.DecisionTreeClassifier
to create leaf nodes. I know that the parameters criterion{“gini”, “entropy”}, default=”gini”
and splitter{“best”, “random”}, default=”best”
are used to split nodes. However, I could not find more information about the threshold used for spliting.
There are some methods involved in the creation of leaf nodes: post-pruning (cutting back the tree after a tree has been built) and pre-pruning (preventing overfitting by trying and stopping the tree-building process early). It would be very useful to know more details about the criteria used for splitting to have a better understanding and be able to customize these models even more.
Pre-pruning is handled by a variety of parameters:
max_depth
, min_samples_split
, min_samples_leaf
, min_weight_fraction_leaf
, max_leaf_nodes
, and min_impurity_decrease
.
Post-pruning is relatively new to sklearn, and is accomplished with minimal cost-complexity pruning, parameter ccp_alpha
.
Answered by Ben Reiniger on November 27, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP