How is this score function estimator derived?

Question

In this paper (https://arxiv.org/pdf/1703.03864.pdf) they have this equation, where they use the score function estimator, to estimate the gradient of an expectation. How did they derive this?

Nikos M. · Accepted Answer

This is simply a special case (where $p_psi = N(0,1)$) of the general gradient estimator for Natural Evolution Strategies (proved in another reference, look it up):

Outline of derivation based on the general formula for the gradient estimator:
$$nabla_psi E_{theta sim p_psi} left[ F(theta) right] = E_{theta sim p_psi} left[ F(theta) nabla_psi log({p_psi}(theta)) right]$$
If
$$epsilon sim mathbb{N}(0, 1) = frac{1}{sqrt{2 pi}}e^{-frac{epsilon^2}{2}}$$
then
$$psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) = frac{1}{sigmasqrt{2 pi}}e^{-frac{(psi-theta)^2}{2sigma^2}}$$
Thus: $psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) Longleftrightarrow epsilon = frac{psi-theta}{sigma} sim mathbb{N}(0,1)$
So:
$$begin{align}
nabla_theta E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) right] &= E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) nabla_theta (-frac{(psi-theta)^2}{2sigma^2}) right] 
&= E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) nabla_epsilon (-frac{epsilon^2}{2}) frac{d(frac{psi-theta}{sigma})}{dtheta} right] 
&= frac{1}{sigma} E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) epsilon right] 
&= nabla_theta E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) right]
end{align}$$
note: scalar variables were considered in above steps for simplicity, but easy to extend/derive for vector variables

How is this score function estimator derived?

One Answer

Add your own answers!

Ask a Question