TransWikia.com

Why activation function is not needed during the runtime of an Word2Vec model

Data Science Asked on May 15, 2021

In Word2Vec trainable model, there are two different weight matrix. The matrix $W$ from input-to-hidden layer and the matrix $W’$ from hidden-to-output layer.

Referring to this article, I understand that the reason we have the matrix $W’$ is basically to compensate for the lack of activation function in the output layer. As activation function is not needed during runtime, there is no activation function in the output layer. But we need to update the input-to-hidden layer weight matrix $W$ through backpropagation to eventually reach to the word embedding most suitable for our usecase. So there is this weight matrix $W’$ in the output layer.

But my question is why activation function is not needed during the runtime? Can anyone please explain?

2 Answers

From this StackOverflow Question

while no activation is explicitly formulated, we could consider it to be a linear classification function. It appears that the dependencies that the word2vec models try to model can be achieved with a linear relation between the input words.

Adding a non-linear activation function allows the neural network to map more complex functions, which could in turn lead to fit the input onto something more complex that doesn't retain the dependencies word2vec seeks.

Answered by Joschua Xner on May 15, 2021

I think a word2vec model is supposed to be a linear classifier. We want a model that can represent the relative meaning of words in an Euclidean, human-interpertable space. In that way, we can calculate distances between word vectors that are understandable and easy to interpret by us, humans.

Answered by Leevo on May 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP