Is it possible the model be better on a few epochs rather than hundreds of epochs?

Question

I have very interesting experience in my CNN binary image classification. Do you think the result is by chance or there is a logic behind it?
I used InceptionV3 transfer with softmax (I know you will say why not ReLU) but it is what I did.
I trained on 100 epochs. But the result was terrible. from the training process I noticed in the 12-th epochs the result is excellent (both train accuracy and validation accuracy ).  So I trained the model on 12-th epochs. and suprisely the result on the test data was also excellent.
Does it mean on a few epoch the result is better?

Shahriyar Mammadli · Answer

Yes, this is the reason you should use 'early stopping' in your models which will stop training when the model is not improving or you can keep the history of the training to pick the epoch that had the best performance.
The reason you get excellent results in the 12th epoch, but terrible performance in the 100th epoch is simply you are overtraining. By overtraining, you are causing overfitting, and the model is not able to generalize, instead, it imitates your data. Thus, the model will have high accuracy in in-sample data but comparably bad in out-of-sample data when you train a lot.
Moreover, take into account that unnecessarily complex and poorly regularized models are likely to overfit also. Especially, when the input data size is small. But in any case, if you lose performance as you train the model, this is probably because of overtraining.
For this reason, try to always have a graph of training accuracy vs test accuracy (or validation accuracy) by epoch. Thus, you can observe where (in which epochs) train and test accuracy move together. Where your train accuracy is >> test (or validation) accuracy then there happens overfitting, and where your test (or validation) accuracy is >> train accuracy then you are underfitting there.

Shiv · Answer

This image perfectly defines your situation, you achieved great results on 12th epoch because after that your model starts to overfit your training data resulting in bad testing results.
12th epoch is your model's Best Fit.
You also would have noticed between 1-12 epochs both your Training as well as Testing error was going down.

Is it possible the model be better on a few epochs rather than hundreds of epochs?

2 Answers

Add your own answers!

Ask a Question