Is it always better to using stacked LSTM than single LSTM?

Question

I am currently studying LSTM and RNNs.

I came across several concepts like Multidimensional LSTM and Stacked LSTM.

I have used Stacked LSTM and it gives me a better performance than single LSTM. As per my understanding, if I increase the depth of LSTM, the number of hidden units also increases. It means overfitting, right? Then why am I getting better results?

[Note: I have used BatchNorm and Dropout after every stack of LSTM ]

Henry Lidgley · Answer

From https://machinelearningmastery.com/stacked-long-short-term-memory-networks/:

"Stacking LSTM hidden layers makes the model deeper, more accurately earning the description as a deep learning technique ... The additional hidden layers are understood to recombine the learned representation from prior layers and create new representations at high levels of abstraction. For example, from lines to shapes to objects ... A sufficiently large single hidden layer Multilayer Perceptron can be used to approximate most functions. Increasing the depth of the network provides an alternate solution that requires fewer neurons and trains faster. Ultimately, adding depth it is a type of representational optimization."

Increasing the number of layers/hidden units in a neural network doesn't necessarily result in overfitting. Too few will result in low training and test accuracies; too many will result in high training accuracy but low test accuracy (overfitting). Somewhere in the middle there will be the right amount of hidden layers and units for the problem. Some complex problems like NLP require a number of stacked hidden LSTM layers http://ruder.io/deep-learning-nlp-best-practices/.

This thread might be useful: https://ai.stackexchange.com/questions/3156/how-to-select-number-of-hidden-layers-and-number-of-memory-cells-in-an-lstm

Is it always better to using stacked LSTM than single LSTM?

One Answer

Add your own answers!

Ask a Question