Data Science Asked by Darome on August 13, 2020
I was implementing the transformer architecture in tensorflow.
I was following the tutorial : https://www.tensorflow.org/tutorials/text/transformer#setup_input_pipeline
They implement the positional encoding in this way:
angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))
However in the paper i is not divided by 2 (i//2), is this a bug? , or why is the reason to make this operation?
thanks
It's not a bug, although they added some confusion with this trick. They should better call their argument $j$ instead of $i$, cos what they actually do is they take all values $0 leq j leq d_{model} - 1$ and compute $PE(pos, j)$. $j$ сan be either even or odd, but in the right side of the equation it even, that's why they compute i//2
and multiply back by 2.
Answered by Michael Solotky on August 13, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP