Data Science Asked by prog on March 10, 2021
when a transformer model is trained there is linear layer in the end of decoder which i understand is a fully connected neural network. During training of a transformer model when a loss is obtained it will backpropagate to adjust the weights.
My question is how deep the backpropagation is?
Please help me with the answer.
Backpropagation extends to the full model, through all decoder and encoder layers up to the embedding tables.
Correct answer by noe on March 10, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP