Question about Relative-Position-Representation code

Data Science Asked by DunkOnly on May 26, 2021

In https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/common_attention.py

In _relative_attention_inner method, which I think is one of the core code for https://arxiv.org/abs/1803.02155

x_t_r shape is [sequence_length, batch_size, hidden_size]

z shape is [sequence_length, sequence_length, hidden_size]

then tf.matmul x_t_r and z to get a tensor with shape [batch_size, sequence_length, sequence_length], How To Understand This Line? How To Understand The Insight of This Line?

x_tz_matmul=tf.matmul(x_t_r, z, transpose_b=transpose)

To be brief, I remove the heads in shape.

attention mechanism deep learning nlp tensorflow transformer

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Joshua Engel on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Peter Machado on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?