Data Science Asked on September 5, 2021
The general gradient boosting algorithm for tree-based classifiers is as follows:
Input: training set ${displaystyle {(x_{i},y_{i})}_{i=1}^{n},}{displaystyle {(x_{i},y_{i})}_{i=1}^{n},}$a differentiable loss function ${displaystyle L(y,F(x)),}{displaystyle L(y,F(x)),}$ number of iterations M.
Algorithm:
Now the loss function that they input is logo-loss: $sum_i −(y_ilog(p_i)+(1−y_i)log(1−p_i))$ where $i$ is counter over all the samples in training set. The output of a tree leaf in the algorithm is assumed to be log-odds of samples belonging to the positive class.
In reality, how the algorithm is executed is – The $gamma _{jm}$ is computed by expanding the Loss function in Taylor’s series of degree 2 and then differentiating that w.r.t $gamma$
I will write the formulation for the case when there is just one sample ($x1$) in a terminal node.
$$L(y_1,F_{m-1}(x1)+gamma)=L(y_1,F_{m-1}(x1)) + frac{partial L(y_1,F_{m-1}(x1))}{partial F_{m-1}}gamma + frac{1}{2}frac{partial^2 L(y_1,F_{m-1}(x1))}{partial F_{m-1}^2}gamma^2$$
Now, this formulation is differentiated w.r.t $gamma$ and set equal to 0 to get the optimal value of gamma
$$gamma = -frac{frac{partial L(y_1,F_{m-1}(x1))}{partial F_{m-1}}}{frac{partial^2 L(y_1,F_{m-1}(x1))}{partial F_{m-1}^2}}$$
This computes to $$Residual_1/P_0(1-P_0) tag 1$$ where $P_0$ is the most recent predicted probability.
The example below is taken from this youtube video – linked to exact time 16:43
For an example case where $F_0(X1) = 0.69$ (these are log-odds)
(converting the log odds to probability) $P_0 = 0.67; Residual_1 = y_1-P_0 = 0.33$
Now they compute $gamma$ (the output value) for the leaf containing $x1$ by plugging these values into the equation (1) and get the $gamma$ to be equal to $$frac{0.33}{(0.67)(0.33)} = 1.49$$
Questions:
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP