TransWikia.com

How is this score function estimator derived?

Data Science Asked on August 13, 2020

In this paper (https://arxiv.org/pdf/1703.03864.pdf) they have this equation, where they use the score function estimator, to estimate the gradient of an expectation. How did they derive this?
How is this derived. Source: https://arxiv.org/pdf/1703.03864.pdf

One Answer

This is simply a special case (where $p_psi = N(0,1)$) of the general gradient estimator for Natural Evolution Strategies (proved in another reference, look it up):

enter image description here

Outline of derivation based on the general formula for the gradient estimator:

$$nabla_psi E_{theta sim p_psi} left[ F(theta) right] = E_{theta sim p_psi} left[ F(theta) nabla_psi log({p_psi}(theta)) right]$$

If

$$epsilon sim mathbb{N}(0, 1) = frac{1}{sqrt{2 pi}}e^{-frac{epsilon^2}{2}}$$

then

$$psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) = frac{1}{sigmasqrt{2 pi}}e^{-frac{(psi-theta)^2}{2sigma^2}}$$

Thus: $psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) Longleftrightarrow epsilon = frac{psi-theta}{sigma} sim mathbb{N}(0,1)$

So:

$$begin{align} nabla_theta E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) right] &= E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) nabla_theta (-frac{(psi-theta)^2}{2sigma^2}) right] &= E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) nabla_epsilon (-frac{epsilon^2}{2}) frac{d(frac{psi-theta}{sigma})}{dtheta} right] &= frac{1}{sigma} E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) epsilon right] &= nabla_theta E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) right] end{align}$$

note: scalar variables were considered in above steps for simplicity, but easy to extend/derive for vector variables

Correct answer by Nikos M. on August 13, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP