TransWikia.com

How is the backbone of two neural networks trained?

Data Science Asked by satinder singh on April 8, 2021

Suppose, I have a backbone network(convolutional neural network). After this network ends, the output is fed into two neural networks. Both building on the outputs of the feature extractor(CNN). Now if I want to train this complete network from scratch on two different tasks, the weights of the layers after the backbone network can be updated easily, but how should I update the weights of the backbone network. I mean I can compute gradients with respect to two losses, shall I take the mean for the gradients in the backbone or it has to be some weighted sum? if it is the weighted sum then how would the parameters of the weighted sum be updated?

Thanks

One Answer

In General any sort of Gradient-based learning is done on scalar functions, so those functions f: ℝ^n ↦ ℝ. (In fact this is the meaning of a Gradient). Mainly if you want to define any minimization problem, you need a single value to minimize and not more.

This means: Ultimately your loss always has to be a scalar (a single number). Combining the gradients in the middle (so before backpropping into your backbone) would ultimately be equal to just combining the losses. And a weighted loss will be easier to implement.

For reference, you can look this talk by A. Kaparthy on how Tesla does multi-task learning on a single Backbone and how they deal with combining different losses.

Correct answer by Simon Boehm on April 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP