Data Science Asked on May 23, 2021
As the paper by Schaul & LeCun states
The findings are clear: in contrast to the other algorithms tested,
vSGD-fd does not require any hyper-parameter tuning to give reliably
good performance on the broad range of tests: the learning rates adapt
automatically to different curvatures and noise levels.
Auto-adapting learning rate sounds like a huge deal,
but why is everyone seem to be using Adam and other optimizers?
This is my answer at Cross Validated for the same question asked by the same person:
It's already 2019, still nobody answered this question. I don't understand why vSGD isn't popular either. But I do have some reasons from my own:
not real auto-adapting. Like v-SGD, ADAM is also a so-called auto-adapting algorithm, but that is not true. The learning rate, and window size in v-SGD, the beta terms in ADAM all need tuning. The new-variants like AMSGrad and NosAdam seem to be more robust though.
too "complex". v-SGD uses a "bprop" term to estimate the Hessian diagonal, and later there is also a finite-difference version. These are somehow "complex" methods to use in computer science and engineering. The numerical instability and the inherent inaccuracy can cause a lot of trouble. That may be the reason why Tensorflow and Pytorch developers didn't include v-SGD in their package. And that also caused very few comparisons with v-SGD in the subsequent optimization papers.
speed. If the estimated Hessian is a good approximate, then it surely speeds up. However, first, the local landscape may not be the "noisy quadratic loss" as stated in the paper; second, the estimate is also very rough. Actually, there can be no exact estimate of Hessian diagonal. These factors all make the speed of v-SGD algorithm not so competitive.
That is only some understanding with not a lot of practice. Hope someone could point out probably more accurate points.
Answered by Andre on May 23, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP