TransWikia.com

Impact of a new word on word embedding vectors

Data Science Asked on February 4, 2021

Question

What is the impact of a new word on the word embedding vectors already trained before the word is invented?

For instance, at November 2019, there existed multiple pre-trained models from Hugging Face which contained word embedding vectors for the words that existed in the training+testing data used. Now there is a new word COVID-19. What will happen to the vectors? For instance the vector for vaccine will change if the model is re-trained on the new data including COVID-19?

I suppose the locations and the distance between two words is a representation of the relation between them. Will COVID-19 being added relocate the vectors of existing and related words such as vaccine, flue, virus?

If a fine-tuning (transfer learning) has been done already on the model without COVID-19, I suppose we need to import the new model and re-run the fine-tuning. Is this correct? A NLP system which has been using the model without COVID-19 needs to be updated with the new models too?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP