Why not always use the ADAM optimization technique?

Question

It seems the Adaptive Moment Estimation (Adam) optimizer nearly always works better (faster and more reliably reaching a global minimum) when minimising the cost function in training neural nets.

Why not always use Adam? Why even bother using RMSProp or momentum optimizers?

Christopher Klaus · Accepted Answer

Here’s a blog post reviewing an article claiming SGD is a better generalized adapter than ADAM.
There is often a value to using more than one method (an ensemble), because every method has a weakness.

user50386 · Answer

You should also take a look at this post comparing different gradient descent optimizers. As you can see below Adam is clearly not the best optimizer for some tasks as many converge better.

Why not always use the ADAM optimization technique?

2 Answers

Add your own answers!

Ask a Question