How to interpret gradient descent in boosting ensembles?

Data Science Asked on March 9, 2021

I struggle to grasp the role of gradient based optimization in boosting ensembles. As far as I understand boosting means combining a bunch of estimators (of the same types, usually decision trees) sequentially — each subsequent one is learning from the errors of the previous ones (by upweighting the misclassified examples, if I see correctly) and combining the results.

(Subquestion: does this combination mean that we use all the subsequently trained constituent estimators, maybe with different weights, or we just take the final one, which is assumed to be the most accurate?).

However, I cannot figure out how gradient descent and learning rate comes into the picture here. Trees themselves are not gradient based learners, and combining the output (either way) doesn’t require any optimization. So what is its role?

boosting ensemble modeling gradient descent

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

haakon.io on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Peter Machado on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?