What could be the problem leading to the result that a model can never perfectly overfit?

Question

I tried to fit my model on a small batch of 128 samples for binary classification. The model should be powerful enough as it has hundreds of thousands of parameters. It should be able to overfit to 100% accuracy. However, it only fts to 96% for the best. It is about the same as when I train it on 30,000 samples. So, I tried the following but all failed:
use a smaller batch of 16 samples, it still cannot overfit
use different optimizers, including Adam, SGD, Adagrad, even reset the optimizer every 1,000 epochs, not working
every epoch, only train on the samples that are misclassified, not working.
The problem should be with this network since another more basic neural network can 100% fit. This one can only 99.2% fit. The top layer is indeed sigmoid.
Anyone got any idea what could be the problem?

ARandomName · Answer

I would guess there are a few samples that aren't necessarily from the same distribution as the rest. I would try identifying outliers and removing them.

Isaac.casm · Answer

Although it is true that you have more parameters than samples, DNN trains all those parameters at the same time, which makes it harder to overfit. Try reducing the learning rate and use SGD with momentum = 0. In addition, don't forget to remove any type of regularisation.
I am assuming you want to keep using 128 samples and the network you have designed, but you can always reduce the number of parameters or use a standard network to test (ResNet, Inception, VGG). I would usually take enough samples for one batch when I want to overfit the network.
Anyhow, if the network achieves 96% I would start by reducing the learning rate.
Good luck

What could be the problem leading to the result that a model can never perfectly overfit?

2 Answers

Add your own answers!

Ask a Question