choosing model based on last or best iteration on validation set

Question

This is a very basic question, however I haven't found a satisfying answer until now. 
When training a neural network we must choose the number of epochs. The usual advice is to train as long as the loss on the validation set goes down. But it happens that the validation loss goes down, then a little bit up, then down again. So for me the most reliable way to find the point to stop training and get the best model would be to take a very high number of epochs and just choose the best validation loss out there (be it the last iteration or, more probable, something in between). One drawback is of course the long training time due to high number of epochs. 
To avoid long training time and still get a good validation score, people use "early stopping" techniques (as described in the often cited paper Early Stopping — But When? from Lutz Prechelt). In the paper mentioned, the author points out that indeed the validation curve can go up and down and you can never be sure that you reached some kind of absolute minimum. In consequence, if you use "early stopping" methods, you introduce some kind of bias in your model training. 
That is why I thought to train as many epochs as I can afford and choose the best epoch to sotp training afterwards. I thought this would be less biased and the most "correct" way to choose the number of epochs for training. 
In stackexchange I found the following related questions:

Present results of the best or the last iteration on dev set? where the answer is, that we should choose the best iteration and not the last one.

Reporting test result for cross-validation with Neural Network where the answer is that if is not correct to choose the best iteration since this introduces also a bias. Instead, one should opt for a apriori fixed number of epochs and choose the last iteration.

If I put the theory for a moment aside, I think it is unwise to stop training too soon if the validation loss still goes down. Also it is unwise to choose a validation loss that is higher because I trained until overfitting. So clearly, I am inclined to take some minimum.

To sum it up, at the bottom is a plot of my validation loss of a 5-Fold-CV. Using early stopping could have stopped training at those strange plateaus. Using the best loss score seems also a bit "biased" in those highly jittering regions.

What is the correct way to stop training here??

Peque · Answer

Too much noise in the curve may indicate a too-high learning rate. You could try reducing the learning rate with the number of epochs, when the curve becomes flat.

You can also use early stopping with a "window". That means, stop training if the loss is not getting better after N epochs. Where getting better means a % or a fixed number at your choice too.

I would use the best epoch. Using the last epoch sounds weird to me and could introduce some bias too if you wait to see some initial results before choosing a maximum number of epochs (and you are probably going to do that).

Also, you may want to spend some time trying to improve your training set if you have not done so already. Maybe you can find more data or augment it.

If you are unable to overfit, you may as well want to try with a more complex model. Regularization/dropout could help with your leaning curves too.

choosing model based on last or best iteration on validation set

One Answer

Add your own answers!

Ask a Question