TransWikia.com

Attention Mechanisms and Alignment Models in Machine Translation

Cross Validated Asked on December 5, 2021

From the paper that introduced attention mechanisms (Bahdanau et al 2014: Neural Machine Translation by Jointly Learning to Align and Translate), it seems that the translating part is the regular RNN/LSTM encoder-decoder and the aligning part is the actual attention mechanism (another smaller MLP), used to align words in the input language sentence into the target sentence.

Is that interpretation correct? the so-called attention mechanism is the alignment model?

In that case, the attention mechanism is used to attend to certain input words in the source sentence during each iterative prediction of words for the target sentence?

One Answer

Yes, this is the idea that the original paper promoted.

Note, however, that it is a little bit tricky to use the term alignment. For purposes of the statistical machine translation, it was defined as a meaning correspondence of the source and target words. A highly cited 2017 study shows that the attention might learn very unintuitive alignments which seem totally wrong without any loss of translation quality.

Answered by Jindřich on December 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP