Data Science Asked by Sia Rezaei on February 18, 2021
This is an issue that I have come across over and over again. Loss (cross-entropy in this case) and accuracy plots that do not make sense. Here is an example:
Here, I’m training a ReNet18 on CIFAR10. Optimizer is SGD with 0.1 learning rate, 0.9 Nesterov momentum, 1e-4 weight decay. The learning rate is decreased to a ⅕ at epochs 60, 120, 160.
Now there are two things that don’t make sense to me:
After epoch 120 (where LR is decreased) val. loss and accuracy start
improving for a couple of epochs (the green box). Why would
decreasing the learning rate suddenly improve validation performance
of a model that was already overfitting?! I would expect the drop in
LR to actually accelerate overfitting.
After epoch ~125 (the blue box) loss starts going up but accuracy keeps
improving. I understand that loss could go up while accuracy stays
constant (by the model getting more confident in its wrong
predictions or less confident in its correct predictions). But I
don’t get how accuracy can improve while loss goes up.
Just a couple of points below
Generally, smaller lr means the model has less "freedom" to hop very far around in the feature space. And so there are less chances to see significant and fast loss decrease. As you suggest, there is a slight upward overfitting trend in your val loss but is as significant or fast as the smaller lr allows in training.
Also, note that your train loss is higher than val loss for a good amount of your training (~55th epoch). You may want to investigate how your model is regularised as this may affect your learning curves in ways that may help you infer diagnostics better. E.g. in pytorch, train loss between model states (model.train() and model.eval()) differ significantly.
Answered by hH1sG0n3 on February 18, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP