Data Science Asked on March 9, 2021
I struggle to grasp the role of gradient based optimization in boosting ensembles. As far as I understand boosting means combining a bunch of estimators (of the same types, usually decision trees) sequentially — each subsequent one is learning from the errors of the previous ones (by upweighting the misclassified examples, if I see correctly) and combining the results.
(Subquestion: does this combination mean that we use all the subsequently trained constituent estimators, maybe with different weights, or we just take the final one, which is assumed to be the most accurate?).
However, I cannot figure out how gradient descent and learning rate comes into the picture here. Trees themselves are not gradient based learners, and combining the output (either way) doesn’t require any optimization. So what is its role?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP