Can I fine-tune BERT, ELMO or XLnet for Seq2Seq neural machine translation?

Question

I'm working on neural machine translator that translates English sentences to American sign language sentences(e.g below). I've a quite small dataset - around 1000 sentence pairs. I'm wondering if it is possible to fine-tune BERT, ELMO or XLnet for Seq2seq encoder/decoder machine translation.

English: He sells food.

American sign language: Food he sells

Jindřich · Answer

You can view models like ELMo or BERT to be encoder-only. They can be easily used for classification or sequence tagging, but the tag sequence is typically monotonically aligned with the source sequence. Even though the Transformer layers in BERT or XLNet are in theory capable of arbitrary reordering (which is used in non-autoregressive machine translation models), this is not what BERT or XLNet were trained for and therefore it will be hard to finetune for that.

If at least the vocabulary is the same on both the source and target side, I would recommend pre-trained sequence-to-sequence models: MASS or BART.

If the both the grammar and vocabulary and grammar of the sign language are quite different, maybe using BERT as an encoder and training your own lightweight autoregressive decoder might be the correct way.

Can I fine-tune BERT, ELMO or XLnet for Seq2Seq neural machine translation?

One Answer

Add your own answers!

Ask a Question