Question about Relative-Position-Representation code

Data Science Asked by DunkOnly on May 26, 2021

In https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/common_attention.py

In _relative_attention_inner method, which I think is one of the core code for https://arxiv.org/abs/1803.02155

x_t_r shape is [sequence_length, batch_size, hidden_size]

z shape is [sequence_length, sequence_length, hidden_size]

then tf.matmul x_t_r and z to get a tensor with shape [batch_size, sequence_length, sequence_length], How To Understand This Line? How To Understand The Insight of This Line?

x_tz_matmul=tf.matmul(x_t_r, z, transpose_b=transpose)

To be brief, I remove the heads in shape.

attention mechanism deep learning nlp tensorflow transformer

Add your own answers!

Ask a Question

Get help from others!