Cross Validated Asked by I. A on January 29, 2021
While training my convolutional neural network to predict emotions, I displayed at the same time the training and the validation data loss. The training loss appear to decrease over time, while on the other hand, the validation data loss has some weird behavior. Below is the figure that I obtained while training the model in three different ways.
Please note that the model is composed of 4 convolutional layers followed by a recurrent neural network (GRU) which is responsible for detecting sequences in the input data.
The light blue curve correspond to the model where the images are fed into the model in order, starting from 0 till frame 7500, without shuffling on each epoch and in each training step. In this case, the initial state of the RNN was set to be equal to the output (or the last state) of the RNN.
The red curve corresponds to the model where the images are fed into the model in order as well(same as the previous case), without shuffling, but the initial state of the RNN at each training step is set to 0.
The dark blue curve corresponds to a model where the images are fed into the model randomly (starting frame is chosen randomly) and the number of frames is chosen randomly as well. In this case, the initial state to the RNN was initialized to zero as well.
Therefore, I would like to know whether the shape of the loss function of the validation dataset is reasonable. To me it doesn’t make sense how it is fluctuating, and maybe the only reasonable curve is the dark blue one (we can assume that after overfitting the validation loss starts increasing)
Does the light blue and the red curves indicate any error or mistake in model? Or the data is too noisy so that I’m getting this fluctuating curve?
I am using MSE as a loss function.
Below is the loss on the training dataset.
Any help is much appreciated!!
If you are performing a classification task, you should not use the MSE Loss function. MSE Loss function acts well for regression tasks, but it will be a non-convex optimization while using it for Classification.
Try using Binary Cross Entropy or Cross-Entropy Loss function.
I answered what I know according to my knowledge, I hope it's helpful. Happy Coding!!
Answered by Alluri L S V Siddhartha Varma on January 29, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP