Data Science Asked on June 10, 2021
Is it possible to compute attention/adapt existing transformer architectures (like longformer
) to be used on multi-dimensional sequence input?
As in, instead of a 1D array of tokens (like a python list
of tokens to be used to calculate attention on), I feed an array of 2D/3D/4D tokens and I want to pre-train my language model on that via Masked Language Modelling
technique (i.e predicting masked tokens).
Is it even possible to do this? any idea what modification I would have to make?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP