Why does bidirectional LSTM have half the parameter count compared to LSTM in keras?

Question

I want to implement a unidirectional and a bidirectional LSTM in tensorflow keras wrapper with the same amount of units. As an example I implement the unidirectional LSTM with 256 units, and the bidirectional LSTM with 128 units (which as I understand gives me 128 for each direction, for a total of 256 units). The implementation details:
import tensorflow as tf
in_ = tf.keras.Input(shape=(28,28))

x = tf.keras.layers.LSTM(256)(in_)

model_unidirectional = tf.keras.Model(in_,x)
print(model_unidirectional.summary())

y = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128))(in_)

model_bidirectional = tf.keras.Model(in_,y)
print(model_bidirectional.summary())

However, looking at the models' summary, the unidirectional LSTM has double the parameter count compared to the bidirectional LSTM, even if they have the same output shape in both cases:
model_unidirectional summary:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_11 (InputLayer)        [(None, 28, 28)]          0         
_________________________________________________________________
lstm_11 (LSTM)               (None, 256)               291840    
=================================================================
Total params: 291,840
Trainable params: 291,840
Non-trainable params: 0
_________________________________________________________________

model_bidirectional summary:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_11 (InputLayer)        [(None, 28, 28)]          0         
_________________________________________________________________
bidirectional_7 (Bidirection (None, 256)               160768    
=================================================================
Total params: 160,768
Trainable params: 160,768
Non-trainable params: 0
_________________________________________________________________

Why does the bidirectional approach have significantly less parameters if their output shape is the same?

Saurabh Tiwari · Accepted Answer

For Unidirectional LSTM the number of parameters are
4*[(numHiddenUnit+inputSize)*numHuddenUnits+numHuddenUnits]
where 4 is for 4 LSTM gate equations.
For your case numHuddenUnits = 256, inputSize is 28 gives the result 291840
For birectional LSTM the number of parameters are
2 * 4 * [(numHiddenUnit+inputSize)*numHuddenUnits+numHuddenUnits]
where 2 is due to bi-directional weights and 4 is for four gate equations
For your case numHuddenUnits = 128, inputSize = 28 gives the result 160768

Leevo · Answer

It's because you wrote:
x = tf.keras.layers.LSTM(256)(in_)

and:
y = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128))(in_)

The number of parameters is different: 256 vs 128. Their output shape is the same because Bidirectional RNN layers are technically couples of paired RNN layers.

Why does bidirectional LSTM have half the parameter count compared to LSTM in keras?

2 Answers

Add your own answers!

Ask a Question