TransWikia.com

What is momentum in neural network?

Data Science Asked on June 16, 2021

While using "Two class neural network" in Azure ML, I encountered "Momentum" property. As per documentation, which is not clear, it says

For The momentum, type a value to apply during learning as a weight on
nodes from previous iterations.

Although that is not very clear. Can someone please explain?

3 Answers

Momentum in neural networks is a variant of the stochastic gradient descent. It replaces the gradient with a momentum which is an aggregate of gradients as very well explained here.

It is also the common name given to the momentum factor, as in your case.

Maths

The momentum factor is a coefficient that is applied to an extra term in the weights update: Momentum

Note: image from visual studio magazine post

Advantages

Beside others, momentum is known to speed up learning and to help not getting stuck in local minima.

Intuition behind

As it is really nicely explained in this quora post, the momentum comes from physics:

Momentum is a physical property that enables a particular object with mass to continue in it's trajectory even when an external opposing force is applied, this means overshoot. For example, one speeds up a car and then suddenly hits the brakes, the car will skid and stop after a short distance overshooting the mark on the ground.

The same concept applies to neural networks, during training the update direction tends to resist change when momentum is added to the update scheme. When the neural net approaches a shallow local minimum it's like applying brakes but not sufficient to instantly affect the update direction and magnitude. Hence the neural nets trained this way will overshoot past smaller local minima points and only stop in a deeper global minimum.

Thus momentum in neural nets helps them get out of local minima points so that a more important global minimum is found. Too much of momentum may create issues as well as systems that are not stable may create oscillations that grow in magnitude, in such cases one needs to add decay terms and so on. It's just physics applied to neural net training or numerical optimizations.

In video

This video shows a backpropagation for different momentum values.

Other interesting posts

How does the momentum term for backpropagation algorithm work?

Hope it helps.

Correct answer by etiennedm on June 16, 2021

As a non formal definition and non thorough, you can understand momentum in the gradient descent as an inertia.

So when you are doing down the hill in the optimization problem you just add "momentum" to the descending and it helps with things as noise in the data, saddle points and stuff like that.

For a more thorough analysis see https://towardsdatascience.com/stochastic-gradient-descent-with-momentum-a84097641a5d

This is not dependent on azure but common in all NN

Answered by Carlos Mougan on June 16, 2021

Momentum is a technique to prevent sensitive movement. When the gradient gets computed every iteration, it can have totally different direction and the steps make a zigzag path, which makes training very slow. Something like this.

enter image description here

To prevent this from happening, momentum kind of stabilizes this movement. You can find more in the Following Article

Answered by Hazarapet Tunanyan on June 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP