Data Science Asked on August 13, 2020
In this paper (https://arxiv.org/pdf/1703.03864.pdf) they have this equation, where they use the score function estimator, to estimate the gradient of an expectation. How did they derive this?
This is simply a special case (where $p_psi = N(0,1)$) of the general gradient estimator for Natural Evolution Strategies (proved in another reference, look it up):
Outline of derivation based on the general formula for the gradient estimator:
$$nabla_psi E_{theta sim p_psi} left[ F(theta) right] = E_{theta sim p_psi} left[ F(theta) nabla_psi log({p_psi}(theta)) right]$$
If
$$epsilon sim mathbb{N}(0, 1) = frac{1}{sqrt{2 pi}}e^{-frac{epsilon^2}{2}}$$
$$psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) = frac{1}{sigmasqrt{2 pi}}e^{-frac{(psi-theta)^2}{2sigma^2}}$$
Thus: $psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) Longleftrightarrow epsilon = frac{psi-theta}{sigma} sim mathbb{N}(0,1)$
So:
$$begin{align} nabla_theta E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) right] &= E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) nabla_theta (-frac{(psi-theta)^2}{2sigma^2}) right] &= E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) nabla_epsilon (-frac{epsilon^2}{2}) frac{d(frac{psi-theta}{sigma})}{dtheta} right] &= frac{1}{sigma} E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) epsilon right] &= nabla_theta E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) right] end{align}$$
note: scalar variables were considered in above steps for simplicity, but easy to extend/derive for vector variables
Correct answer by Nikos M. on August 13, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP