TransWikia.com

Stochastic Gradient Descent Converges Not to a Maximum Point

Mathematics Asked on November 2, 2021

Let $ninmathbb N$, $B$ a $n$-dimensional Brownian motion, $sigma_t$ a positive (deterministic) caglad (LCRL) function in $mathcal L^2([0,infty[)$ and $finmathcal C^2(mathbb R^ntomathbb R)$ a nonnegative function taking its global minimum at least at one point. Further $f$ has a Lipschitz continuous gradient and $lim_{||x||toinfty}f(x)=infty$. We define the stochastic process $X$ by the SDE
$$
dX_t=-nabla f(X_t)dt+sigma_tdB_t,qquad tgeq0\
X_0=xi.
$$

I already proofed that a unique solution exist and the process $nabla f(X_t)$ converges a.s. to zero and $f(X_t)$ is convergent a.s. to a finite value. By looking on the SDE it is absolutely obvious that this finite value cannot be the point of a maximum of $f$, since the process is nowhere constant (which can be easily proofen) and then the process gets pushed away of a maximum point by the first term. But I can’t create a contradiction when I assume that it is a maximum point. Can anyone give a hint how to proof it?


I already found an example in which the process $X$ got stuck in a saddle point with probability unequal zero. So it is not possible to poof directly that it converges to a minimum point.

If $sigma$ were zero it would be easy to show, since we then have the deterministic case in which $f(X_t)$ is differentiable. I also tryed to approximate the process $X$ by the process
$$
dY^{t_0}_t=-nabla f(Y^{t_0}_t)dt,qquad tgeq t_0\
Y^{t_0}_{t_0}=X_{t_0},
$$

of which I know that it cannot converge to a maximum, since it is monotonically decreasing. But it wasn’t succesfull.
A fruther idea is this (for the one dimensional case): As we know, that $f(X_t)$ converges a.s., let denote $Y$ its limit. Further we know $nabla f(X_t)to0$ a.s. Assume we have a set of meassure greater than zero on which $Y$ is a maximum point. We now consider such a path. With Itô we get
$$ (f(X_t)-Y)^2=(f(X_0)-Y)^2+int_0^t (f(X_s)-Y)(sigma_s^2 f”(X_s)-2f'(X_s)^2)+sigma_s^2 f'(X_s)^2mathrm ds\
+2int_0^t sigma_s f'(X_s)(f(X_s)-Y)mathrm dB_s
$$

Now for $t$ larger than a $t_0$ we can assume, that the integrand in the first integral is positve, since the process is a.s. not constant. As the LHS has to decrease I want to create a contradiction. But I can’t control the Itô integral. I know that it is a $mathcal L^2$ Martingale and thus convergent, but I don’t see that this is enough. Anyway a more general way than using the martingale property would be desirable (but not necessary).

If there are further assumptions necessary to show the desired, we can assume it. Any help would be appreciated.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP