TransWikia.com

Why do we only care about convex functions when doing Gradient Descent/SGD?

Data Science Asked on February 21, 2021

I mean I know why we specifically care about convex functions: it’s because their local minimum are also global, and so you just have to "follow a path which goes down" to find the minima of the function.

However, there are also functions which are not convex, but for which local minima are also global minima, for example, a function which looks like this: enter image description here

Isn’t there a way to characterize every function which "works well" with gradient descent? Something like "if f has a local minima, then it’s also a global minima", which would be a weaker hypothesis than convex?

PS: When writing this, I also realize that we have a problem with functions like $x rightarrow{} x^3$, for which gradient descent could stop on a "plateau". Anyway, my question remains the same: Isn’t it possible to define a more general class of functions for which gradient descent works better than the convex functions?

One Answer

Probably in the 1-dimensional case this more precise class of functions you aim at would be something like:

Piece-wise monotonous (thus approachable with gradient descent locally) with up to one flip (thus you know that one optimum would be where the pieces meet, the other would be at the extremes)

However, even if one is able to come up with such a definition for the simpler space of 1-dimensional functions, it becomes much more complicated to come up with a generalisation that would be valid in the more generic multidimensional case.

Despite this difficulty, the effort still might be worthwhile if there's going to be a distinctive benefit of doing it. In this particular moment, I personally wonder if this might be the case. Surely, it depends on the context of your problem. Wouldn't a possible approach be to break down your problem into steps:

  1. Try to break down your surfaces/functions in piecewise convex
  2. Apply gradient descent on the pieces
  3. Compare the optima that you have identified

Answered by mapto on February 21, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP