TransWikia.com

Why does bidirectional LSTM have half the parameter count compared to LSTM in keras?

Data Science Asked by DiMorten - Jorge Chamorro on May 9, 2021

I want to implement a unidirectional and a bidirectional LSTM in tensorflow keras wrapper with the same amount of units. As an example I implement the unidirectional LSTM with 256 units, and the bidirectional LSTM with 128 units (which as I understand gives me 128 for each direction, for a total of 256 units). The implementation details:

import tensorflow as tf
in_ = tf.keras.Input(shape=(28,28))

x = tf.keras.layers.LSTM(256)(in_)

model_unidirectional = tf.keras.Model(in_,x)
print(model_unidirectional.summary())

y = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128))(in_)

model_bidirectional = tf.keras.Model(in_,y)
print(model_bidirectional.summary())

However, looking at the models’ summary, the unidirectional LSTM has double the parameter count compared to the bidirectional LSTM, even if they have the same output shape in both cases:

model_unidirectional summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_11 (InputLayer)        [(None, 28, 28)]          0         
_________________________________________________________________
lstm_11 (LSTM)               (None, 256)               291840    
=================================================================
Total params: 291,840
Trainable params: 291,840
Non-trainable params: 0
_________________________________________________________________

model_bidirectional summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_11 (InputLayer)        [(None, 28, 28)]          0         
_________________________________________________________________
bidirectional_7 (Bidirection (None, 256)               160768    
=================================================================
Total params: 160,768
Trainable params: 160,768
Non-trainable params: 0
_________________________________________________________________

Why does the bidirectional approach have significantly less parameters if their output shape is the same?

2 Answers

For Unidirectional LSTM the number of parameters are 4*[(numHiddenUnit+inputSize)*numHuddenUnits+numHuddenUnits]

where 4 is for 4 LSTM gate equations. For your case numHuddenUnits = 256, inputSize is 28 gives the result 291840

For birectional LSTM the number of parameters are 2 * 4 * [(numHiddenUnit+inputSize)*numHuddenUnits+numHuddenUnits]

where 2 is due to bi-directional weights and 4 is for four gate equations

For your case numHuddenUnits = 128, inputSize = 28 gives the result 160768

Correct answer by Saurabh Tiwari on May 9, 2021

It's because you wrote:

x = tf.keras.layers.LSTM(256)(in_)

and:

y = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128))(in_)

The number of parameters is different: 256 vs 128. Their output shape is the same because Bidirectional RNN layers are technically couples of paired RNN layers.

Answered by Leevo on May 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP