Non-Real Time Data Augmentation for CNN Classification. What are the drawbacks?

Question

When people talk about and use data augmentation, are they mostly referring to real-time data augmentation? In the case of image classification, that would involve augmenting the data right before fitting the model, and a new augmented image is used every epoch. In this case only augmented images are used to train the model and the raw image is never used, so the size of the input doesn’t actually change.

But what about non-real-time data augmentation? By this, I mean augmenting the data in the preprocessing stage, so that you literally expand the sample size of your input. Then you feed all those augmented images along with the original into the cnn, so it’s the same images every epoch. Is this a valid idea? Has it been done and what are the drawbacks? Any logical fallacies or objections by data science and machine learning experts?

Thanks!

Leevo · Answer

Yes, they are mostly referring to real-time data augmentation.

It is not said that raw images are never used. Actually, it depends on what kind of data augmentation function you are using. Original images can either be kept intact or distorted.

Data augmentation si done real time pretty much 100% of the times for the following reason: data would be too big to be stored on your hard drive. Even the smallest image datasets could take hundreds of GBs after augmentation.

Non-Real Time Data Augmentation for CNN Classification. What are the drawbacks?

One Answer

Add your own answers!

Ask a Question