TransWikia.com

Backpropagation of a transformer

Data Science Asked by prog on March 10, 2021

when a transformer model is trained there is linear layer in the end of decoder which i understand is a fully connected neural network. During training of a transformer model when a loss is obtained it will backpropagate to adjust the weights.

My question is how deep the backpropagation is?

  • does it happen only till linear layer weights(fully connected neural net) ?
  • OR does it extend to all the decoder layer weight matrices(Q,K,V) and Feed forward layers weights?
  • OR does it extend to the even the encoder + decoder weights ?

Please help me with the answer.

One Answer

Backpropagation extends to the full model, through all decoder and encoder layers up to the embedding tables.

Correct answer by noe on March 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP