Data Science Asked by Shonix3373 on April 16, 2021
I trained different classification models using Keras with different numbers of hidden layers and the same number of neurons in each layer. What I found was the accuracy of the models decreased as the number of hidden layers increased, however, the decrease was more significant in larger numbers of hidden layers. The following graph shows the accuracy of different models where the number of hidden layers changed while the rest of the parameters stay the same (each model has 64 neurons in each hidden layer):
My question is why is the drop in accuracy between 8 hidden layers and 16 hidden layers much greater than the drop between 1 hidden layer and 8 hidden layers, even though the difference in the number of hidden layers is the same (8).
As you add more parameters, your model is probably overfitting to the training data, that is, it is memorizing the training data and therefore is worse at generalizing when used on the test/validation data.
Answered by noe on April 16, 2021
Neural Networks try to approximate the function which maps the given image to its label. Changing the number of hidden layers essentially means changing the number of parameters, which would be used to approximate that input-label mapping. These parameters include weights and biases ( in the case of Dense layers ). Depending upon the number of parameters, the model might,
Underfit, meaning it has lesser parameters than that required to solve the problem. This would result in a decrease in the model's accuracy over the test dataset.
Overfit, meaning the model has excessive parameters and now it tends to memorize the given training dataset. In this case, the model's accuracy over the training dataset would be much higher than that over the testing data.
Apart from these cases, if the model's loss, as well as accuracy ( or any metric to measure the performance of the model ), seems to improve both on the testing and training data, up to a certain number of epochs, we say that the model has generalized itself.
It depends on the problem you're trying to solve. CNNs like InceptionV3, MobileNets, ResNets have hundreds of hidden layers as they are trained on huge image datasets like CelebA or ImageNet. So, they require millions of parameters to generalize themselves and obtain a better accuracy over the testing data.
You may find a clearer definition of overfitting here.
My question is why is the drop in accuracy between 8 hidden layers and 16 hidden layers much greater than the drop between 1 hidden layer and 8 hidden layers, even though the difference in the number of hidden layers is the same (8).
Probably the no. of parameters in those hidden layers were just up to the mark for the model to generalize itself. As the no. of hidden layers was increased to 16, the model had excessive parameters and it overfitted the training data.
How do we determine the no. of hidden layers required by a model?
One way to determine is to perform hyperparameter optimization and to search for the best possible combinations of hyperparameters ( learning rate, dropout rate ).
Answered by Shubham Panchal on April 16, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP