Data Science Asked on December 19, 2021
I know how to calculate gradient descent for a convex function where there is only one global minima. Also, I know methods to handle cases where the function is a non-convex function.
What is really bothering me that for a non-convex function like $y = -x^2$, how gradient descent is actually calculated as here descent would go to minus infinity instead of directly converging to global maxima.
Thus, it contradicts the point of getting stuck in a saddle point for a function like $(x^2 – y^2)$.
The calculation of the gradient is the same no matter the optimization algorithm you use. Gradient descent is a method to find (local) minima of a function. If no such minimum exists, any algorithm to find it will render wrong results or never converge. $-x^2$ only has a maximum, so looking for a minimum does not make much sense and to find the maximum you would need to run gradient ascent, i.e. going uphill.
Getting stuck in a saddle point is a different issue. For $f(x,y) = x^2 - y^2$ the gradient is $nabla f(x,y) = 2[x, -y]^T$ and thus equals zero at $x=y=0$, so gradient ascent/descent would just sit there and not move. But $f(0,0) = 0$ is neither a minimum nor a maximum ($f$ actually doesn't have either of these).
Answered by matthiaw91 on December 19, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP