Data Science Asked by user105907 on April 29, 2021
LSTM cells consist of two types of states, the cell state and hidden state.
How do cell and hidden states differ, in terms of their functionality? What information do they carry?
Traditional Recurrent Neural Networks (RNN) have the ability to model sequential events by propagating through time, i.e. forward and backward propagation. This is achieved by "connecting" these sequential events with the hidden state:
$a_n = f(W_n, a_{n-1}, x_n)$
The hidden state $a_n$ carries past information by applying a linear combination over the previous step and the current input.
Despite being a very successful architecture, RNN have the issue of vanishing/exploding gradients. This means that every previous step is essentially considered in the calculation of the backpropagation (how wrong my prediction has been), due to the chain rule engraved in $a_n$:
$a_n = f(W_n, a_{n-1}, x_n) = f(W_n, f(W_{n-1}, a_{n-2}, x_{n-1}), x_n)$, since $ a_{n-1}=f(W_n, a_{n-2}, x_n)$.
To summarise: RNNs are great, but issues occur with log terms dependencies because of the chain rule in their hidden state.
To alleviate the issues above, LSTM architectures introduce the cell state, additional to the existing hidden state of RNNs. Cell states give the model longer memory of past events. This long term memory capability is enabled by
In case you wonder "how does it know what to store or what's immediately useful?": remember that this a trainable weight that learns with training, consider it as an additional piece of muscle that will learn this new activity storing and loading by training it on examples (i.e. labelled datapoints).
To summarise: LSTMs are usually better at dealing with long term dependencies, because of their capacity to store and load beliefs that are important at different parts of the sequence.
TLDR:
GRUs are also very relevant but are excluded from the response.
Answered by hH1sG0n3 on April 29, 2021
RNN is not able to retain memory that are from far back in the past because of the vanishing gradient problem (i.e. the gradient from backpropagation is unable to reach the earlier states). This is a limitation of the model itself. Thus, we need to introduce a more powerful model, i.e. lowering the bias (in the expense of increasing the variance).
Introducing the cell state into the LSTM cells actually increased the complexity of the model. As you know, increasing complexity will usually increase variance and decrease bias. The cell state acts as a highway in order for the gradient to flow better to the earlier states, which in turn allows the model to capture memory that are further back in the past.
Answered by Alvin on April 29, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP