Data Science Asked on April 18, 2021
I am currently studying LSTM and RNNs.
I came across several concepts like Multidimensional LSTM and Stacked LSTM.
I have used Stacked LSTM and it gives me a better performance than single LSTM. As per my understanding, if I increase the depth of LSTM, the number of hidden units also increases. It means overfitting, right? Then why am I getting better results?
[Note: I have used BatchNorm
and Dropout
after every stack of LSTM ]
From https://machinelearningmastery.com/stacked-long-short-term-memory-networks/:
"Stacking LSTM hidden layers makes the model deeper, more accurately earning the description as a deep learning technique ... The additional hidden layers are understood to recombine the learned representation from prior layers and create new representations at high levels of abstraction. For example, from lines to shapes to objects ... A sufficiently large single hidden layer Multilayer Perceptron can be used to approximate most functions. Increasing the depth of the network provides an alternate solution that requires fewer neurons and trains faster. Ultimately, adding depth it is a type of representational optimization."
Increasing the number of layers/hidden units in a neural network doesn't necessarily result in overfitting. Too few will result in low training and test accuracies; too many will result in high training accuracy but low test accuracy (overfitting). Somewhere in the middle there will be the right amount of hidden layers and units for the problem. Some complex problems like NLP require a number of stacked hidden LSTM layers http://ruder.io/deep-learning-nlp-best-practices/.
This thread might be useful: https://ai.stackexchange.com/questions/3156/how-to-select-number-of-hidden-layers-and-number-of-memory-cells-in-an-lstm
Answered by Henry Lidgley on April 18, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP