How to improve a CNN without changing the architecture?

Question

I'm currently using an autoencoder CNN that's built upon the VGG-16 architecture that was designed by someone else. I want to replicate their results using their dataset first but I'm finding that:

-Validation losses diverge from training losses fairly early on (I get to around 10 epochs and it already looks like it's overfitting) 
-At its best, the validation losses aren't even close to being as low as training losses
-In general, the accuracy is still worse than reported in their paper.

I'm new to machine learning and want to know if there are hyperparameters I should try to change or what I can do to maybe tinker with it without changing its architecture?

Tinu · Answer

Are you in fact using the same architecture as they are? If not that could potentially be the problem.

Otherwise, are you using the same trainings protocol as they, i.e. optimizer, learning rate, learning rate schedule, batch size, preprocessing, weight initialization, number of training epochs? Depending on the size of your model and the amount of training data, 10 epochs might not be enough to judge about your models performance.

Can you link the paper?

How to improve a CNN without changing the architecture?

One Answer

Add your own answers!

Ask a Question