Data Science Asked by crommy on March 1, 2021
So this is taken from an exam I just did. I’d like to know if there are any instances same as in the image where the CART algorithm could use a negative alpha and thus encourage a larger tree? Or does the algorithm state that alpha must be a non negative integer at all times?
Making an additional split will always decrease $Err$ (until pure leaves are reached), so any negative $alpha$ will have the same minimizing $T$ as $alpha=0$. So we might as well take $alphageq0$. (But there is no requirement that $alpha$ be an integer.)
Erm, unless you implement things so that even a pure node can be split, in which case a negative $alpha$ will make the tree split even these, until every leaf consists of a single sample (and duplicates thereof) or the maximum depth is reached.
Answered by Ben Reiniger on March 1, 2021
Or does the algorithm state that alpha must be a non negative integer at all times?
As far as I know that is actually the case:
In "The Elements of Statistical Learning" (1) on page 308 the authors write:
The tuning parameter $alpha geq 0$ governs the tradeoff between tree size and its goodness of fit to the data.
Also, the scikit-learn documentation says:
Minimal cost-complexity pruning is an algorithm used to prune a tree to avoid over-fitting, described in Chapter 3 of [BRE]. This algorithm is parameterized by $alpha geq 0$ known as the complexity parameter.
Both sources refer to "Classification and Regression Trees" (2) as the original source.
(1) "The Elements of Statistical Learning"; Hastie et al; 2nd Ed; 2008
(2) "Classification and Regression Trees"; Breiman, et al; 1984
Answered by Sammy on March 1, 2021
? is a parameter used for regularization which is controlling the tree from being overfit , an inherent issue with CART algorithms. A negative ? would lead to a larger tree which may accentuate the overfitting issue
Answered by vivek on March 1, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP