Data Science Asked on June 15, 2021
Suppose I am working with a Masked Language Model to pre-train on a specific dataset. In that dataset, most sequences have a particular token of a high frequency
Sample Sequence:-
<tok1>, <tok1>, <tok4>, <tok7>, <tok4>, <tok4> ---> here tok4 is very frequent in this sequence
So if I mask some tokens and get the model to train to predict those masked tokens, obviously the model will gain a bias in predicting <tok4>
due to its statistical frequency.
Since <tok4>
represents important information, ‘downsampling’ (or removing those frequent tokens) would not be preferred and I would love to have my sequence as intact as possible.
How best should I deal with this? Is there any already established method that can counter this problem?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP