Backpropagation of a transformer

Question

when a transformer model is trained there is linear layer in the end of decoder which i understand is a fully connected neural network. During training of a transformer model when a loss is obtained it will backpropagate to adjust the weights.
My question is how deep the backpropagation is?

does it happen only till linear layer weights(fully connected neural net) ?
OR does it extend to all the decoder layer weight matrices(Q,K,V) and Feed forward layers weights?
OR does it extend to the even the encoder + decoder weights ?

Please help me with the answer.

noe · Accepted Answer

Backpropagation extends to the full model, through all decoder and encoder layers up to the embedding tables.

Backpropagation of a transformer

One Answer

Add your own answers!

Ask a Question