Data Science Asked on August 6, 2021
Should I first perform data augmentation or normalization in deep learning? I am mainly interested in 2D and 3D input data. In tutorials that I have seen so far the data augmentation always comes first. Is there a (mathematical) reason for that? Would it also work vice versa?
In the specific context of image/video problems, it may be okay to normalize the data before augmentation, because you already know that each feature (pixel) will have a value between 0 and 255. As long as you normalize w.r.t. the theoretical min and max value, then the order of normalization/augmentation shouldn't matter.
But in general, data augmentation should always come first. Otherwise, the normalized features may be incorrect after augmentation operators are applied. After all, you can think of augmentation as gathering additional data. The normalization computed on your original dataset may not be valid after gathering additional data.
To illustrate, suppose we have a dataset consisting of three 2x2 matrices like this: $$ begin{bmatrix} 0 & 0 0 & 200 end{bmatrix} % begin{bmatrix} 50 & 100 200 & 0 end{bmatrix} % begin{bmatrix} 100 & 200 0 & 0 end{bmatrix} $$
And suppose we decide that one good way to augment the data is to flip one or more of the images horizontally. If we do that for the third image before normalizing. Then the dataset becomes: $$ begin{bmatrix} 0 & 0 0 & 200 end{bmatrix} % begin{bmatrix} 50 & 100 200 & 0 end{bmatrix} % begin{bmatrix} 100 & 200 0 & 0 end{bmatrix} % begin{bmatrix} 200 & 100 0 & 0 end{bmatrix} $$
And after normalization, the dataset will be: $$ begin{bmatrix} 0.0 & 0.0 0.0 & 1.0 end{bmatrix} % begin{bmatrix} 0.25 & 0.5 1.0 & 0.0 end{bmatrix} % begin{bmatrix} 0.5 & 1.0 0.0 & 0.0 end{bmatrix} % begin{bmatrix} 1.0 & 0.5 0.0 & 0.0 end{bmatrix} $$
This is the correct encoding of our dataset, and it can be transformed back into the original feature space without errors.
Now what happens if we normalize first, then apply our augmentation operators?
Again we start with our dataset of three matrices: $$ begin{bmatrix} 0 & 0 0 & 200 end{bmatrix} % begin{bmatrix} 50 & 100 200 & 0 end{bmatrix} % begin{bmatrix} 100 & 200 0 & 0 end{bmatrix} $$
This time we normalize first. Notice that the top-left feature has a range [0, 100] in our dataset, while all other features have a range [0, 200]. So after normalization the dataset becomes: $$ begin{bmatrix} 0.0 & 0.0 0.0 & 1.0 end{bmatrix} % begin{bmatrix} 0.5 & 0.5 1.0 & 0.0 end{bmatrix} % begin{bmatrix} 1.0 & 1.0 0.0 & 0.0 end{bmatrix} $$
Now let's augment the dataset like we did before, by flipping the last matrix horizontally. The augmented dataset is: $$ begin{bmatrix} 0.0 & 0.0 0.0 & 1.0 end{bmatrix} % begin{bmatrix} 0.5 & 0.5 1.0 & 0.0 end{bmatrix} % begin{bmatrix} 1.0 & 1.0 0.0 & 0.0 end{bmatrix} % begin{bmatrix} 1.0 & 1.0 0.0 & 0.0 end{bmatrix} $$
Oh no! The last two matrices are identical in the normalized space. But we know that they should represent two different instances. And if we try to map these matrices back into their original form, we would get incorrect results. The last two matrices would be identical, even though we know they should be flipped.
Usually, the errors would not be this extreme, but this example show why it is dangerous to normalize before augmentation. After augmentation, the features may not "line up" correctly with the original normalization.
Answered by zachdj on August 6, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP