Data Science Asked on September 4, 2020
It seems the Adaptive Moment Estimation (Adam) optimizer nearly always works better (faster and more reliably reaching a global minimum) when minimising the cost function in training neural nets.
Why not always use Adam? Why even bother using RMSProp or momentum optimizers?
Here’s a blog post reviewing an article claiming SGD is a better generalized adapter than ADAM.
There is often a value to using more than one method (an ensemble), because every method has a weakness.
Correct answer by Christopher Klaus on September 4, 2020
You should also take a look at this post comparing different gradient descent optimizers. As you can see below Adam is clearly not the best optimizer for some tasks as many converge better.
Answered by user50386 on September 4, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP