Data Science Asked by Valentas on December 25, 2020
Attention is all you need is a nice paper that suggests using
positional encodings as an alternative to RNNs in their Transformer architecture.
GPT-2 and GPT-3 are examples of using this architecture which
are trained on input data of a massive scale.
Is there a paper and a model that uses positional encodings
and outcompetes RNN/LSTM based models for small scale datasets (MBs of text data, not terabytes)?
If there are many, which ones are the leading ones in production applications?
Is there a paper and a model that uses positional encodings and outcompetes RNN/LSTM based models for small scale datasets (MBs of text data, not terabytes)?
Yes, there are several. Similar to GPT, they still pre-train on terabytes of data. But the embedding they learn generalize well. Then you can fine-tune on a much smaller dataset. It works much in the same way as transfer learning on a CNN where a model first is trained on ImageNet and then trained on a specific task. It tends to give better results than RNN/LSTMs.
If there are many, which ones are the leading ones in production applications?
The one that sees most use is definitely BERT. Here is a really nice explanation of how it works. This transformers library from Huggingface makes it really easy to work with BERT and other transformers that have already been pre-trained.
Answered by Simon Larsson on December 25, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP