TransWikia.com

Why RMSProp converges faster than Momentum?

Data Science Asked on February 11, 2021

Why is RMSProp in many cases converging faster than Momentum?

Momentum:

$$v_{dW} := beta v_{dw} +(1-beta)dW$$
$$W := W-alpha v_{dw}$$

RMSProp:

$$ S_{dw} := B cdot S_{dw} + (1-B)cdot (dW)^2$$
$$W := W- alpha frac{dW}{sqrt{S_{dw}}}$$

Where $alpha$ is the learning rate (0.01 etc), $beta$ is the momentum term (0.9 etc), similar to B

From my point of view, both momentum and RMSProp have “tendency to keep moving”. Well, I can see how RMSprop will naturally accelerate on flat surfaces due to

$$frac{1}{sqrt{S_{dw}}}$$

when $S_{dw}$ is small, but is there another benefit that RMSprop provides?

2 Answers

The basic intuition is that you should not have the same learning rate for different dimensions. For instance, you can have a high slope in one direction but not for another. Consequently, you should not have the same speed for the two directions. Momentum adds acceleration. Suppose gradient is your instant velocity and the average is your average velocity. Momentum is actually viscosity or somehow friction. Suppose that you are near your optimal points, your gradients become zero and you have low average which means your speed changes slowly. They have both alpha term but what is going to be used is the running average, just a kind of average which is simple to be calculated. Take a look at here and here for making an analogy.

Answered by Media on February 11, 2021

Momentum is linear and provides speed to the update

RMSprop contributes the exponentially decaying average of past "squared gradients"

In RMS Prop By using the average, we actually try to diminish the vertical movement because they sum up to 0(approximately) while averaging.

RMS provides average to the update

Adam uses RMS prop and Momentum Speed and Average of update combined together, On an average it will speed up the direction in which more update is needed

All three are faster than Stochastic Gradient Decent without Exponential Weighted Average, Worst Case use Momentum, Dont go for normal weight updates

Answered by Varun Bajpai on February 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP