Data Science Asked by tidylobster on March 12, 2021
I’m reading a TensorFlow tutorial on Word2Vec models and got confused with the objective function. The base softmax function is the following:
$P(w_t|h) = softmax(score(w_t, h) = frac{exp[score(w_t, h)]}{Sigma_v exp[score(w’,h)]}$, where $score$ computes the compatibility of word $w_t$ with the context $h$ (a dot product is commonly used). We train this model by maximizing its log-likelihood on the training set, i.e. by maximizing $ J_{ML} = log P(w_t|h) = score(w_t,h) – log bigl(Sigma_v exp[score(w’,h)bigr)$
But why $log$ disappeared from the $score(w_t,h)$ term?
No, the logartihm doesn't disappear. From the equation
,
When you want to calculate
, it essentially means calculating ,
Now ,
So ,
as .
Correct answer by Gyan Ranjan on March 12, 2021
It's just an optimisation, for the sake of speed and numerical stability. The two are equivalent for the purpose of determining the gradient since log(x) is monotonically increasing with x.
Answered by Jinglesting on March 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP