What's the input dimension for transformer decoder during TRAINING?

Question

For example, translate English sentence A to French sentence B.
During training with ith word in B, all previous words before B will be fed to decoder, whose length will change for different i. How this is handled so that it can fit into a fixed dimension in the final linear layer during TRAINING?

SrJ · Answer

For feeding word one by one in transformer network we pass the whole sentence along with a mask to the network. And the mask will do the job by unmasking one new word at a time.

What's the input dimension for transformer decoder during TRAINING?

One Answer

Add your own answers!

Ask a Question