Pytorch - Gradient distribution between functions

Question

https://colab.research.google.com/github/pytorch/tutorials/blob/gh-pages/_downloads/neural_networks_tutorial.ipynb

Hi I am trying to understand the NN with pytorch. 
I have doubts in gradient calculations..

import torch.optim as optim

create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

From the about code, I understood loss.backward() calculates the gradients. 
I am not sure, how these info shared with optimizer to update the gradient.

Can anyone explain this..

Thanks in advance !

user3658307 · Answer

Recall that you passed net.parameters() to the optimizer, so it has access to the "Tensor" objects, as well as their associated data. One of the associated data fields associated to each learnable tensor parameter is a gradient buffer. Hence, backward() not only computes the gradients, but stores them in each parameter tensor, so that the gradient vector per parameter is stored along with that parameter. In other words, for some parameter $theta_i$, backward() stores $ partial mathcal{L}(Theta)/partial theta_i$ along with that parameter. The optimizer.step() call then simply updates each parameter via the gradient stored along with it.

Pytorch - Gradient distribution between functions

One Answer

Add your own answers!

Ask a Question