use the same gradient to maximize one part of the model and minimize another part of the same model

Question

I want to calculate the gradient and use the same gradient to minimize one part and maximize another part of the same network (kind of adversarial case). For me, Ideal case would be, if there are two optimizers responsible for two part of the network/model and one of the optimizers has a negative learning rate. But it seems that PyTorch does not allow negative learning rate.
In this case what I am doing is:
loss.backward()
optimzer_for_one_part of the model.step()

and then
(-loss).backward()

Problem is, This time the again calculated gradient will not be the same(values are different but flopped of course) because some weights of the same network (same computation graph) have already been changed. But, Ideally, I want to use the flipped version of the previous gradient.
How can I achieve this?

Jindřich · Accepted Answer

The trick you are looking for is called the Gradient Reversal Layer. It is a layer that does nothing (i.e., identity) in the forward pass, but it reverts the sign of the gradient, so everything behind the layer optimizes the opposite of the loss function.
There are several PyTorch implementations:

https://github.com/janfreyberg/pytorch-revgrad
https://github.com/jvanvugt/pytorch-domain-adaptation/blob/master/utils.py

Initially, it was introduced for unsupervised domain dataptaion. Now it has quite a lot of applications, such as removing sensitive information from CV representation or removing language identity from multilingual contextual embeddings.

use the same gradient to maximize one part of the model and minimize another part of the same model

One Answer

Add your own answers!

Ask a Question