Data Science Asked by DiMorten - Jorge Chamorro on May 9, 2021
I want to implement a unidirectional and a bidirectional LSTM in tensorflow keras wrapper with the same amount of units. As an example I implement the unidirectional LSTM with 256 units, and the bidirectional LSTM with 128 units (which as I understand gives me 128 for each direction, for a total of 256 units). The implementation details:
import tensorflow as tf
in_ = tf.keras.Input(shape=(28,28))
x = tf.keras.layers.LSTM(256)(in_)
model_unidirectional = tf.keras.Model(in_,x)
print(model_unidirectional.summary())
y = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128))(in_)
model_bidirectional = tf.keras.Model(in_,y)
print(model_bidirectional.summary())
However, looking at the models’ summary, the unidirectional LSTM has double the parameter count compared to the bidirectional LSTM, even if they have the same output shape in both cases:
model_unidirectional
summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_11 (InputLayer) [(None, 28, 28)] 0
_________________________________________________________________
lstm_11 (LSTM) (None, 256) 291840
=================================================================
Total params: 291,840
Trainable params: 291,840
Non-trainable params: 0
_________________________________________________________________
model_bidirectional
summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_11 (InputLayer) [(None, 28, 28)] 0
_________________________________________________________________
bidirectional_7 (Bidirection (None, 256) 160768
=================================================================
Total params: 160,768
Trainable params: 160,768
Non-trainable params: 0
_________________________________________________________________
Why does the bidirectional approach have significantly less parameters if their output shape is the same?
For Unidirectional LSTM the number of parameters are 4*[(numHiddenUnit+inputSize)*numHuddenUnits+numHuddenUnits]
where 4 is for 4 LSTM gate equations. For your case numHuddenUnits = 256, inputSize is 28 gives the result 291840
For birectional LSTM the number of parameters are 2 * 4 * [(numHiddenUnit+inputSize)*numHuddenUnits+numHuddenUnits]
where 2 is due to bi-directional weights and 4 is for four gate equations
For your case numHuddenUnits = 128, inputSize = 28 gives the result 160768
Correct answer by Saurabh Tiwari on May 9, 2021
It's because you wrote:
x = tf.keras.layers.LSTM(256)(in_)
and:
y = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128))(in_)
The number of parameters is different: 256 vs 128. Their output shape is the same because Bidirectional RNN layers are technically couples of paired RNN layers.
Answered by Leevo on May 9, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP