TransWikia.com

Why vSGD-fd optimization algorithm isn't popular?

Data Science Asked on May 23, 2021

As the paper by Schaul & LeCun states

The findings are clear: in contrast to the other algorithms tested,
vSGD-fd does not require any hyper-parameter tuning to give reliably
good performance on the broad range of tests: the learning rates adapt
automatically to different curvatures and noise levels.

Auto-adapting learning rate sounds like a huge deal,
but why is everyone seem to be using Adam and other optimizers?

One Answer

This is my answer at Cross Validated for the same question asked by the same person:

It's already 2019, still nobody answered this question. I don't understand why vSGD isn't popular either. But I do have some reasons from my own:

  1. not real auto-adapting. Like v-SGD, ADAM is also a so-called auto-adapting algorithm, but that is not true. The learning rate, and window size in v-SGD, the beta terms in ADAM all need tuning. The new-variants like AMSGrad and NosAdam seem to be more robust though.

  2. too "complex". v-SGD uses a "bprop" term to estimate the Hessian diagonal, and later there is also a finite-difference version. These are somehow "complex" methods to use in computer science and engineering. The numerical instability and the inherent inaccuracy can cause a lot of trouble. That may be the reason why Tensorflow and Pytorch developers didn't include v-SGD in their package. And that also caused very few comparisons with v-SGD in the subsequent optimization papers.

  3. speed. If the estimated Hessian is a good approximate, then it surely speeds up. However, first, the local landscape may not be the "noisy quadratic loss" as stated in the paper; second, the estimate is also very rough. Actually, there can be no exact estimate of Hessian diagonal. These factors all make the speed of v-SGD algorithm not so competitive.

That is only some understanding with not a lot of practice. Hope someone could point out probably more accurate points.

Answered by Andre on May 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP