Need help to understand backpropagation for gated recurrent units (GRU)

Question

I'm stuck regarding the implementation of backpropagation in a binary classification task using a GRU. I wanted to know if someone could tell me how to proceed. I was able to understand how BP works in a simple RNN because it is very similar to an MLP. However, I don't know how to do it in the context of a GRU. Let's say we want to use a GRU RNN for our binary classification task, i.e. $y_{true} in {0, 1}$. We have the following Gated recurrent unit begin{equation*} begin{aligned} Gamma_{r}^{} & = sigmaleft(gamma_{r}^{}right) = sigmaleft(U_r ; a^{} + W_r ; x^{} + b_rright) Gamma_{u}^{} & = sigmaleft(gamma_{u}^{}right) = sigmaleft(U_u ; a^{} + W_u ; x^{} + b_uright) C^{} & = tanhleft(c^{}right) = tanhleft(U_c left(Gamma_{r}^{} odot a^{} right) + W_c ; x^{} + b_cright) a^{} & = left(1-Gamma_{u}^{}right) odot a^{} + Gamma_{u}^{} odot C^{} end{aligned} end{equation*} Only at the final time $T$, we compute begin{equation*} begin{split} v^{} & = W_{ya} a^{}+b_{y}, y^{} & = sigmaleft(v^{}right). end{split} end{equation*} Hence, the loss function is only computed at the final time $T$ over a single training example as $$ mathcal{L} = - left[y_{true} times ln left(y^{}right) + left(1-y_{true}right) times ln left(1-y^{}right)right]. $$ The activation functions are $$ sigma(x) = frac{1}{1+e^{-x}} $$ $$ tanh(x) = frac{e^{x}-e^{-x}}{e^{x}+e^{-x}} $$ The question is, how can I compute the derivatives w.r.t. all the parameters?

Need help to understand backpropagation for gated recurrent units (GRU)

Add your own answers!

Ask a Question