TransWikia.com

LSTM keras model architecture interpretation

Data Science Asked by sai on January 24, 2021

I would appreciate if anyone could correct my interpretation of the LSTM architecture in keras-

for example in this simple case

model = Sequential()
model.add(LSTM(32, input_shape=(10, 2)))
model.add(Dense(1))

My interpretation is that, 2 features get mapped to 32 cells (sort of dense layer connection) and each of this cell unrolls in time 10 times. Next, since return_sequences=False by default, the output would be 32 values, one from each cell which then again get mapped to a single output neuron.

So something like where

F -> Feature matrix at an instant

I -> Identity matrix

for 10 times:

  • $F[1*2]_{matrix}$ X $I[2*32]_{matrix}$ —-> outputs $[1*32]_{input-to-32-cells}$
  • $in[1*32]$ X $[LSTM-equations]$ —-> outputs $[1*32]_{output-to-dense-on-last-iteration}$
    • if last_iterration:
      • $[1*32]$ X $[32*1]_{trainable-weights}$ —-> outputs a single scalar value

and the second case around which I am trying to understand is, when there are multiple LSTM layers-

something like

model = Sequential()
model.add(LSTM(32, input_shape=(10, 2)))
model.add(LSTM(64))
model.add(Dense(32))
model.add(Dense(1))

Focusing on what is happening between the two LSTM layers, I assumed that there would be a $I[32*64]$ identity matrix but what controls the timesteps of this second LSTM layer? I mean-

A) does the output from 64 cells is processed every time the layer below processes 10 timesteps or

B) does it wait a for a total of 100 timesteps or 10 samples (10*10) and then produce an output?

(I think A should be the right one, but please do comment on it)

Hope I was able to express clearly, if not please let me know.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP