TransWikia.com

Difference between cell state and hidden state

Data Science Asked by user105907 on April 29, 2021

LSTM cells consist of two types of states, the cell state and hidden state.

How do cell and hidden states differ, in terms of their functionality? What information do they carry?

enter image description here

2 Answers

In short:

  • Cell state: Long term memory of the model, only part of LSTM models
  • Hidden state: Working memory, part of LSTM and RNN models

Additional Information

RNN and vanishing/exploding gradients

Traditional Recurrent Neural Networks (RNN) have the ability to model sequential events by propagating through time, i.e. forward and backward propagation. This is achieved by "connecting" these sequential events with the hidden state:

$a_n = f(W_n, a_{n-1}, x_n)$

The hidden state $a_n$ carries past information by applying a linear combination over the previous step and the current input.

Despite being a very successful architecture, RNN have the issue of vanishing/exploding gradients. This means that every previous step is essentially considered in the calculation of the backpropagation (how wrong my prediction has been), due to the chain rule engraved in $a_n$:

$a_n = f(W_n, a_{n-1}, x_n) = f(W_n, f(W_{n-1}, a_{n-2}, x_{n-1}), x_n)$, since $ a_{n-1}=f(W_n, a_{n-2}, x_n)$.

To summarise: RNNs are great, but issues occur with log terms dependencies because of the chain rule in their hidden state.

LSTM and the cell state

To alleviate the issues above, LSTM architectures introduce the cell state, additional to the existing hidden state of RNNs. Cell states give the model longer memory of past events. This long term memory capability is enabled by

  • the storage of useful beliefs from new inputs
  • the loading of beliefs into the working memory (i.e. cell state) that are immediately useful.

In case you wonder "how does it know what to store or what's immediately useful?": remember that this a trainable weight that learns with training, consider it as an additional piece of muscle that will learn this new activity storing and loading by training it on examples (i.e. labelled datapoints).

To summarise: LSTMs are usually better at dealing with long term dependencies, because of their capacity to store and load beliefs that are important at different parts of the sequence.


TLDR:

hidden state:

  • Working memory capability that carries information from immediately previous events and overwrites at every step uncontrollably -present at RNNs and LSTMs

cell state:

  • long term memory capability that stores and loads information of not necessarily immediately previous events
  • present in LSTMs

GRUs are also very relevant but are excluded from the response.

Answered by hH1sG0n3 on April 29, 2021

RNN is not able to retain memory that are from far back in the past because of the vanishing gradient problem (i.e. the gradient from backpropagation is unable to reach the earlier states). This is a limitation of the model itself. Thus, we need to introduce a more powerful model, i.e. lowering the bias (in the expense of increasing the variance).

Introducing the cell state into the LSTM cells actually increased the complexity of the model. As you know, increasing complexity will usually increase variance and decrease bias. The cell state acts as a highway in order for the gradient to flow better to the earlier states, which in turn allows the model to capture memory that are further back in the past.

Answered by Alvin on April 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP