SAGAN - what is the correct architecture?

Data Science Asked by Ilya.K. on May 1, 2021

Hi, in the original paper the following scheme of the self-attention appears:
https://arxiv.org/pdf/1805.08318.pdf

In a later overview:
https://arxiv.org/pdf/1906.01529.pdf

this scheme appears:

referring the original paper.

My understanding more correlates with the second paper scheme, as:

Where there is two dot-product operations and three hidden parametric matrices:
$$W_k, W_v, W_q$$
which corresponds to $W_f, W_g, W_h$ without $W_v$ as it in the original paper explanation, which is as following:

Is this a mistake in the original paper ?

adversarial ml attention mechanism deep learning gan

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Jon Church on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?