Data Science Asked by Pedro Henrique Gomes Venturott on November 26, 2020
I’m having this burning question in my head, and I couldn’t find the answer anywhere. During training, at least in Keras, the training loss is computed on the current batch, so the weights can be updated. So, at least at the first epoch, every batch loss is computed before the model actually learns from that particular epoch. Given this, shouldn’t the validation loss for the first epoch be somewhat near the training loss for the first epoch, since both are computed on examples not seen by the gradient descent algorithm? For every model I’ve built so far, the training loss is always lower than the validtion loss. I expected the training loss to be somewhat near the validation loss on the first run through the dataset, and then, as the model learns from the training dataset (but not from the validation dataset), those errors would start to create a gap between them. Am I missing something trivial here?
Is your batch size the length of the full dataset? If you have $N$ samples, and feed in mini-batches of size $k<N$, then batch losses are computed and the model's weights are updated with every $k$ samples. By the end of the first epoch, there may already have been significantly many updates (learning) from the training set. I believe Keras aggregates these batch losses to compute the epoch loss.
If $k=N$ however, it is possible that the distribution of your training and validation sets are different.
Answered by Adam on November 26, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP