Data Science Asked by stv on November 24, 2020
I have an array of sequences of equal length, each sequence contains 300 numbers (M=300). Each element in a sequence is a number from 1 to 9:
13571398...2455 # 300 numbers
33344467...1143 # 300 numbers
...
...
...
66118859...2121 # 300 numbers
My task is to build a model that predicts an element (number) at sequence positions from 180 to 190 based on the first 179 elements and the last 110 elements in a sequence.
In other words, given elements at positions from 0 to 179 and from 191 to 299 predict elements in a sequence at positions from 180 to 190.
I am thinking about the following steps to solve this task with Keras BiLSTM model:
Please help with the following questions:
Any other ideas of using other models, in particular Transformers (PyTorch, Tesnsorflow) are very welcome, thanks!
The framing of your problem is close to so-called language modeling task. Because your input data is fixed-length samples, you can use a seq2seq model with fixed-size context embedding.
What this means is you would essentially have an encoder, Bi-LSTM for example which encodes your input into a fixed representation (by concatenating the final output states of forward and backward LSTM), and a decoder, for example LSTM, which sequentially produces the output tokens.
Your objective function could be a mean of cross-entropy loss over each output token, or a more complex loss like CTC. You can also simplify it by just predicting the masked tokens, instead of the whole sentence, as output of your neural network.
The fact that your tokens are integers makes no difference and actually simplifies the embedding. You can simply feed the data as is to an embedding layer in Keras or PyTorch. If you use PyTorch, there is this tutorial that I would recommend, using transformer instead of LSTM.
Correct answer by Adam Oudad on November 24, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP