Data Science Asked on December 19, 2021
In this paper they have this equation, where they use the score function estimator, to estimate the gradient of an expectation. How did they derive this?
This is simply a special case (where $p_psi = N(0,1)$) of the general gradient estimator for Natural Evolution Strategies (proved in another reference, look it up):
Outline of derivation based on the general formula for the gradient estimator:
$$nabla_psi E_{theta sim p_psi} left[ F(theta) right] = E_{theta sim p_psi} left[ F(theta) nabla_psi log({p_psi}(theta)) right]$$
If
$$epsilon sim mathbb{N}(0, 1) = frac{1}{sqrt{2 pi}}e^{-frac{epsilon^2}{2}}$$
$$psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) = frac{1}{sigmasqrt{2 pi}}e^{-frac{(psi-theta)^2}{2sigma^2}}$$
Thus: $psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) Longleftrightarrow epsilon = frac{psi-theta}{sigma} sim mathbb{N}(0,1)$
So:
$$begin{align} nabla_theta E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) right] &= E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) nabla_theta (-frac{(psi-theta)^2}{2sigma^2}) right] \ &= E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) nabla_epsilon (-frac{epsilon^2}{2}) frac{d(frac{psi-theta}{sigma})}{dtheta} right] \ &= frac{1}{sigma} E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) epsilon right] \ &= nabla_theta E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) right] end{align}$$
note: scalar variables were considered in above steps for simplicity, but easy to extend/derive for vector variables
Answered by Nikos M. on December 19, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP