Cross Validated Asked on September 22, 2020
While reading this awesome paper I got stuck to understand the SCST model as depicted bellow:
My questions are:
Thanks very much. Any suggestions are highly appreciated.
The inputs are the same.
The so-called "current model" is the literally current model. Two sequences(one is sampled from the softmax distribution and the other is greedily taken from the distribution) are got from the same decoder.
The trick is that if $(r(hat y) - r(y^s))$ is positive, which means the sampled sequence is worse than the greedy one(both compared to the reference sequence), minimizing $L_{rl}$ amonts to reduce the probability that the sample mechanism take a less(than the baseline) performed next time.
Answered by Lerner Zhang on September 22, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP