Computing Attention on multi-dimensional sequences?

Data Science Asked on June 10, 2021

Is it possible to compute attention/adapt existing transformer architectures (like longformer) to be used on multi-dimensional sequence input?

As in, instead of a 1D array of tokens (like a python list of tokens to be used to calculate attention on), I feed an array of 2D/3D/4D tokens and I want to pre-train my language model on that via Masked Language Modelling technique (i.e predicting masked tokens).

Is it even possible to do this? any idea what modification I would have to make?

attention mechanism deep learning masking nlp

Add your own answers!

Ask a Question

Get help from others!

Recent Questions

How can I transform graph image into a tikzpicture LaTeX code?
How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
Need help finding a book. Female OP protagonist, magic
Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

Recent Answers

Jon Church on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?