Difference between cell state and hidden state

Question

LSTM cells consist of two types of states, the cell state and hidden state.
How do cell and hidden states differ, in terms of their functionality? What information do they carry?

hH1sG0n3 · Answer

In short:

Cell state: Long term memory of the model, only part of LSTM models
Hidden state: Working memory, part of LSTM and RNN models

Additional Information
RNN and vanishing/exploding gradients
Traditional Recurrent Neural Networks (RNN) have the ability to model sequential events by propagating through time, i.e. forward and backward propagation. This is achieved by "connecting" these sequential events with the hidden state:
$a_n = f(W_n, a_{n-1}, x_n)$
The hidden state $a_n$ carries past information by applying a linear combination over the previous step and the current input.
Despite being a very successful architecture, RNN have the issue of vanishing/exploding gradients. This means that every previous step is essentially considered in the calculation of the backpropagation (how wrong my prediction has been), due to the chain rule engraved in $a_n$:
$a_n = f(W_n, a_{n-1}, x_n) = f(W_n, f(W_{n-1}, a_{n-2}, x_{n-1}), x_n)$, since $ a_{n-1}=f(W_n, a_{n-2}, x_n)$.
To summarise: RNNs are great, but issues occur with log terms dependencies because of the chain rule in their hidden state.
LSTM and the cell state
To alleviate the issues above, LSTM architectures introduce the cell state, additional to the existing hidden state of RNNs. Cell states give the model longer memory of past events. This long term memory capability is enabled by

the storage of useful beliefs from new inputs
the loading of beliefs into the working memory (i.e. cell state) that are immediately useful.

In case you wonder "how does it know what to store or what's immediately useful?": remember that this a trainable weight that learns with training, consider it as an additional piece of muscle that will learn this new activity storing and loading by training it on examples (i.e. labelled datapoints).
To summarise: LSTMs are usually better at dealing with long term dependencies, because of their capacity to store and load beliefs that are important at different parts of the sequence.

TLDR:
hidden state:

Working memory capability that carries information from immediately previous events and overwrites at every step uncontrollably
-present at RNNs and LSTMs

cell state:

long term memory capability that stores and loads information of not necessarily immediately previous events
present in LSTMs

GRUs are also very relevant but are excluded from the response.

Alvin · Answer

RNN is not able to retain memory that are from far back in the past because of the vanishing gradient problem (i.e. the gradient from backpropagation is unable to reach the earlier states). This is a limitation of the model itself. Thus, we need to introduce a more powerful model, i.e. lowering the bias (in the expense of increasing the variance).
Introducing the cell state into the LSTM cells actually increased the complexity of the model. As you know, increasing complexity will usually increase variance and decrease bias. The cell state acts as a highway in order for the gradient to flow better to the earlier states, which in turn allows the model to capture memory that are further back in the past.

Difference between cell state and hidden state

2 Answers

In short:

Additional Information

RNN and vanishing/exploding gradients

LSTM and the cell state

hidden state:

cell state:

Add your own answers!

Ask a Question