Explanation about i//2 in positional encoding in tensorflow tutorial about transformers

Question

I was implementing the transformer architecture in tensorflow.
I was following the tutorial : https://www.tensorflow.org/tutorials/text/transformer#setup_input_pipeline
They implement the positional encoding in this way:
angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))

However in the paper i is not divided by 2 (i//2), is this a bug? , or why is the reason to make this operation?

thanks

Michael Solotky · Answer

It's not a bug, although they added some confusion with this trick. They should better call their argument $j$ instead of $i$, cos what they actually do is they take all values $0 leq j leq d_{model} - 1$ and compute $PE(pos, j)$. $j$ сan be either even or odd, but in the right side of the equation it even, that's why they compute i//2 and multiply back by 2.

Answered by Michael Solotky on August 13, 2020

Explanation about i//2 in positional encoding in tensorflow tutorial about transformers

One Answer

Add your own answers!

Ask a Question