TransWikia.com

ELMo - How does the model transfer its learning/weights on new sentences

Data Science Asked by dshero on February 22, 2021

Word2vec and Glove embeddings have the same vector representation for every word in the corpus and does not take context into consideration.

For eg:

  • The dog does bark at people
  • The bark of the tree is hard.

In the above examples, Word2vec and Glove create one vector for the word "bark". But using Elmo, there would be two different representations for the word "bark" as it considers context. So, I am trying to the mechanism behind how Elmo gives a vector for a new sentence in the data set?

Say, we have a pre-trained model and it has different vectors for various words. When I have to use this model on a new sentence, does Elmo produce a new vector with completely new vectors? Then is the fine-tuning of the model always happening as it is applied to new data? If that is the case, when is it considered to be a completely trained model?

One Answer

ELMo does not lookup the embeddings from a pre-precomputed table as Word2Vec and GloVe. Embeddings from ELMo are hidden states of an LSTM-based language model, i.e., they are computed on the fly when you give a sentence to the network.

ELMo even does use standard word embeddings as the LSTM input. Words are treated as character sequences and those are processed with a 1-dimensional CNN that provides vector representations of the words. The word representations are then passed into two two-layer LSTMs which are trained as a forward and a backward language model repsectively. What you get as a contextual word embedding is a an average of the LSTM states.

Answered by Jindřich on February 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP