How could I understand the Self-critical sequence training (SCST) model?

Cross Validated Asked on September 22, 2020

While reading this awesome paper I got stuck to understand the SCST model as depicted bellow:
My questions are:

  1. Are $h’_0$ and $h_0$ the same? The same to $c’_0$ and $c_o$. If not what are they respectively?
  2. By “current model” and “reference algorithm”(the second one as depicted) do you mean the same model as the sample model(the first one depicted above) and when it is used for testing respectively?

Thanks very much. Any suggestions are highly appreciated.

One Answer

The inputs are the same.

The so-called "current model" is the literally current model. Two sequences(one is sampled from the softmax distribution and the other is greedily taken from the distribution) are got from the same decoder.

The trick is that if $(r(hat y) - r(y^s))$ is positive, which means the sampled sequence is worse than the greedy one(both compared to the reference sequence), minimizing $L_{rl}$ amonts to reduce the probability that the sample mechanism take a less(than the baseline) performed next time.

Answered by Lerner Zhang on September 22, 2020

