Data Science Asked by Stephanie Lin on December 28, 2020
When people talk about and use data augmentation, are they mostly referring to real-time data augmentation? In the case of image classification, that would involve augmenting the data right before fitting the model, and a new augmented image is used every epoch. In this case only augmented images are used to train the model and the raw image is never used, so the size of the input doesn’t actually change.
But what about non-real-time data augmentation? By this, I mean augmenting the data in the preprocessing stage, so that you literally expand the sample size of your input. Then you feed all those augmented images along with the original into the cnn, so it’s the same images every epoch. Is this a valid idea? Has it been done and what are the drawbacks? Any logical fallacies or objections by data science and machine learning experts?
Thanks!
Yes, they are mostly referring to real-time data augmentation.
It is not said that raw images are never used. Actually, it depends on what kind of data augmentation function you are using. Original images can either be kept intact or distorted.
Data augmentation si done real time pretty much 100% of the times for the following reason: data would be too big to be stored on your hard drive. Even the smallest image datasets could take hundreds of GBs after augmentation.
Answered by Leevo on December 28, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP