What is momentum in neural network?

Question

While using "Two class neural network" in Azure ML, I encountered "Momentum" property. As per documentation, which is not clear, it says

For The momentum, type a value to apply during learning as a weight on
nodes from previous iterations.

Although that is not very clear. Can someone please explain?

etiennedm · Accepted Answer

Momentum in neural networks is a variant of the stochastic gradient descent. It replaces the gradient with a momentum which is an aggregate of gradients as very well explained here.
It is also the common name given to the momentum factor, as in your case.
Maths
The momentum factor is a coefficient that is applied to an extra term in the weights update:

Note: image from visual studio magazine post
Advantages
Beside others, momentum is known to speed up learning and to help not getting stuck in local minima.
Intuition behind
As it is really nicely explained in this quora post, the momentum comes from physics:

Momentum is a physical property that enables a particular object with
mass to continue in it's trajectory even when an external opposing
force is applied, this means overshoot. For example, one speeds up a
car and then suddenly hits the brakes, the car will skid and stop
after a short distance overshooting the mark on the ground.
The same
concept applies to neural networks, during training the update
direction tends to resist change when momentum is added to the update
scheme. When the neural net approaches a shallow local minimum it's
like applying brakes but not sufficient to instantly affect the update
direction and magnitude. Hence the neural nets trained this way will
overshoot past smaller local minima points and only stop in a deeper
global minimum.
Thus momentum in neural nets helps them get out of
local minima points so that a more important global minimum is found.
Too much of momentum may create issues as well as systems that are not
stable may create oscillations that grow in magnitude, in such cases
one needs to add decay terms and so on. It's just physics applied to
neural net training or numerical optimizations.

In video
This video shows a backpropagation for different momentum values.
Other interesting posts
How does the momentum term for backpropagation algorithm work?
Hope it helps.

Carlos Mougan · Answer

As a non formal definition and non thorough, you can understand momentum in the gradient descent as an inertia.
So when you are doing down the hill in the optimization problem you just add "momentum" to the descending and it helps with things as noise in the data, saddle points and stuff like that.
For a more thorough analysis see https://towardsdatascience.com/stochastic-gradient-descent-with-momentum-a84097641a5d
This is not dependent on azure but common in all NN

Hazarapet Tunanyan · Answer

Momentum is a technique to prevent sensitive movement. When the gradient gets computed every iteration, it can have totally different direction and the steps make a zigzag path, which makes training very slow. Something like this.

To prevent this from happening, momentum kind of stabilizes this movement. You can find more in the Following Article

To prevent this from happening, momentum kind of stabilizes this movement.
You can find more in the Following Article

What is momentum in neural network?

3 Answers

Add your own answers!

Ask a Question