How to add attention mechanism to my sequence-to-sequence architecture in Keras?

Question

Based on this blog entry, I have written a sequence to sequence deep learning model in Keras:

model = Sequential()
model.add(LSTM(hidden_nodes, input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_timesteps))
model.add(LSTM(hidden_nodes, return_sequences=True))
model.add(TimeDistributed(Dense(n_features, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=30, batch_size=32)

It works reasonably well, but I intend to improve it by applying attention mechanism. The aforementioned blog post includes a variation of the architecture with it by relying on a custom attention code, but it doesn't work my present TensorFlow/Keras versions, and anyway, to my best knowledge, recently a generic attention has been added to Keras -- I was not able add it to my code, however.

Additionally, I tried to complicate my architecture above by adding 2-2 LSTM layers for the encoder and the decoder respectively instead of 1-1 with this:

model = Sequential()
model.add(LSTM(hidden_nodes, return_sequences=True, input_shape=(n_timesteps, n_features)))
model.add(LSTM(hidden_nodes, return_sequences=True))
model.add(RepeatVector(n_timesteps))
model.add(LSTM(hidden_nodes, return_sequences=True))
model.add(LSTM(hidden_nodes, return_sequences=True))
model.add(TimeDistributed(Dense(n_features, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=100, validation_split=0.15, batch_size=32)

but I get error message (in the 2nd or 3rd row, I assume):

ValueError: Input 0 of layer repeat_vector_17 is incompatible with the layer: expected ndim=2, found ndim=3. Full shape received: [None, 20, 128]

What could be the reason here?

Allohvk · Answer

By default an LSTM model returns only the output of last timestep.
model = Sequential()
model.add(LSTM(hidden_nodes, input_shape=(n_timesteps, n_features)))
##output shape is (n_features)

So the below step is needed to repeat the output vector 'n' number of times where 'n' should be the number of time-steps
model.add(RepeatVector(n_timesteps))
##now shape becomes (n_timesteps,n_features)

But when you specify 'return_sequences=True', LSTM returns a hidden state for ALL timesteps. The output shape from the LSTM directly is (n_timesteps,n_features). So you DONT need to to a 'Repeat Vector'
So to eliminate the error, just remove line 4
Edit - I would recommend using the 'return_sequences=true' option and NOT using Repeatvector option even though the latter may compile. This will lead to better results as you are passing far more data across timesteps to the next layer and this is the accepted approach for most situations

How to add attention mechanism to my sequence-to-sequence architecture in Keras?

One Answer

Add your own answers!

Ask a Question