Data Science Asked by 不是phd的phd on December 10, 2020
I have read the code of ELMo.
Based on my understanding, ELMo first init an word embedding matrix A
for all the word and then add LSTM B
, at end use the LSTM B
‘s outputs to predict each word’s next word.
I am wondering why we can input each word in the vocab and get the final word representation from the word embedding matrix A
after training.
It seems that we lost the information of LSTM B
.
Why the embedding can contains the information we want in the language model.
Why the training process can inject the information for a good word representation into the word embedding matrix A
?
I am wrong. ELMo also use the output of LSTM for context-dependent representation.
The output only from word embedding is the context-independent representation.
Why the representation is useful?
I think it is because, it is learning the difference between words and the representation is not the real meaning for the word.
Correct answer by 不是phd的phd on December 10, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP