Data Science Asked by Antek Smagiel on March 22, 2021
I’m stuck regarding the implementation of backpropagation in a binary classification task using a GRU. I wanted to know if someone could tell me how to proceed. I was able to understand how BP works in a simple RNN because it is very similar to an MLP. However, I don’t know how to do it in the context of a GRU.
Let’s say we want to use a GRU RNN for our binary classification task, i.e. $y_{true} in {0, 1}$.
We have the following Gated recurrent unit
begin{equation*}
begin{aligned}
Gamma_{r}^{<t>} & = sigmaleft(gamma_{r}^{<t>}right) = sigmaleft(U_r ; a^{<t-1>} + W_r ; x^{<t>} + b_rright)
Gamma_{u}^{<t>} & = sigmaleft(gamma_{u}^{<t>}right) = sigmaleft(U_u ; a^{<t-1>} + W_u ; x^{<t>} + b_uright)
C^{<t>} & = tanhleft(c^{<t>}right) = tanhleft(U_c left(Gamma_{r}^{<t>} odot a^{<t-1>} right) + W_c ; x^{<t>} + b_cright)
a^{<t>} & = left(1-Gamma_{u}^{<t>}right) odot a^{<t-1>} + Gamma_{u}^{<t>} odot C^{<t>}
end{aligned}
end{equation*}
Only at the final time $T$, we compute
begin{equation*}
begin{split}
v^{<T>} & = W_{ya} a^{<T>}+b_{y},
y^{<T>} & = sigmaleft(v^{<T>}right).
end{split}
end{equation*}
Hence, the loss function is only computed at the final time $T$ over a single training example as
$$
mathcal{L} = – left[y_{true} times ln left(y^{<T>}right) + left(1-y_{true}right) times ln left(1-y^{<T>}right)right].
$$
The activation functions are
$$
sigma(x) = frac{1}{1+e^{-x}}
$$
$$
tanh(x) = frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}
$$
The question is, how can I compute the derivatives w.r.t. all the parameters?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP