What is the time complexity for training a gated recurrent unit (GRU) neural network using back-propagation through time?

Question

Let us assume we have a GRU network containing $H$ layers to process a training dataset with $K$ tuples, $I$ features, and $H_i$ nodes in each layer.
I have a pretty basic idea how the complexity of algorithms are calculated, however, with the presence of multiple factors that affect the performance of a GRU network including the number of layers, the amount of training data (which needs to be large), number of units in each layer, epochs and maybe regularization techniques, training with back-propagation through time, I am messed up. I have found intriguing answers for neural networks complexity out here-  https://ai.stackexchange.com/questions/5728/what-is-the-time-complexity-for-training-a-neural-network-using-back-propagation, and bi-directional recurrent neural networks here- what is the complexity of a bidirectional recurrent neural network? but that was not enough to clear my doubt.
I am aware that back-propagation through time is used for training the recurrent neural network. But I am not able to understand how this happens for the bi-directional versions of the recurrent neural networks?
So, I was hoping if anyone help me with how to:

Derive the time-complexity of GRU networks for training via back-propagation through time?
Understand with an example the training of bi-directional recurrent neural networks using back-propagation through time? (I tried following the original paper https://ieeexplore.ieee.org/document/650093, but it was kind of confusing for me when they perform the backward pass for training)
Suggest me some research papers, where authors have taken up the time-complexity of recurrent neural networks in their applications and explanation (which I have tried searching, but with not much success)

What is the time complexity for training a gated recurrent unit (GRU) neural network using back-propagation through time?

Add your own answers!

Ask a Question