TransWikia.com

Calculation of Output in LSTM Many-to-One Architecture

Data Science Asked by Mei Lie on January 3, 2021

I’m new to Recurrent Neural Network but I want to train my data with LSTM but I’m having a trouble to understand LSTM Many-to-One architecture.
Suppose the size of my data is time_step x num_features say 2 x 2 and I have to use many-to-one LSTM architecture because I want to do classification. So in the last time_step I have to add dense (a) which contain sigmoid activation function to predict the sequence class which is 0 or 1.

My questions are,

  1. When I compute a, do I need to include all the hidden state (h1 and h2) or just the last hidden state h2?
  2. If I just include the h2, how do I calculate the derivatives of the loss function (cross entropy) w.r.t h1?

The derivation for h1 is highly recommended. Thank you 🙂

One Answer

While training, a set of training examples will be provided in a batch. At end of each batch, weights for all layers are updated (Dense and LSTM).

https://adventuresinmachinelearning.com/keras-lstm-tutorial/

Answered by Shamit Verma on January 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP