TransWikia.com

One Neural network with multiple outputs or multiple neural networks with a single output?

Data Science Asked by Gerges on March 30, 2021

I an building a feed forward deep learning model using tabular data. The inputs are numeric features or categorical features (represented with embeddings). The outputs are the same number of numeric input features.

Is there any known research or models out there which verifies that using a single model with multiple outputs would be better/worse than multiple models, each with a single output?

In essence, with N observations and M outputs, a single model minimizes:

$$
frac{1}{N}sum_n^Nsum_m^M left(y_m^{(n)} – hat{y}_m^{(n)} right)^2
$$

while multiple models with single output, each minimize:

$$
frac{1}{N}sum_n^N left(y_m^{(n)} – hat{y}_m^{(n)} right)^2
$$

For a single value of $m$.

Any reason one would be preferred over the other, or do I just have to try and see for myself?

One Answer

Given the information you provided, the most honest answer is: You have to test it by yourself, there is no general answer for it.

Still, it has been shown empirically in research that a neural network may benefit from having multiple outputs.

So let's say we have a neural network that has multiple outputs. Further, let us group them into specific tasks:

For example:

  • The output neurons of group 1 tell if the image containts a dog or a cat.
  • The output neurons of group 2 tell the size of the animal (width and height)
  • The output neurons of group 3 tell the color of the animal's hair (in some encoding)

and so on...

A common example would be Faster-RCNN vs Mask RCNN.

Assume that $g$ denotes the number of different groups of output neurons.

Now if you take a feed-forward neural network, you will have common layers that eventually branch to the different output groups. Let us call $pi$ the function that maps an input image to this particular last common layer $L$ and let $phi_{j}$ be the function that takes the information from layer $L$ to output the result of group $j$.

Thus, given an input image $mathbf{I}$, the neural network maps it to $begin{pmatrix} phi_{1}(pi(mathbf{I})) vdots phi_{g}(pi(mathbf{I})) end{pmatrix}$.

The output of the last common layer $pi(mathbf{I})=:mathbf{f}$ can be understood as an image descriptor $mathbf{f}$ of the input image $mathbf{I}$.

In particular, all predicted outputs rely on the information contained in $mathbf{f}$.

$textbf{Therefore}$: Merging multiple outputs into a single neural network can be understood as a regularization technique. The image descriptor $mathbf{f}$ must contain not only the information if the images shows a dog or a cat, but also all the other information. It must therefore be a more comprehensive (or "more realistic") description of the input, which makes it more difficult for the network to overfit. The network cannot solve a specific task using a non-plausible explanation, as the corresponding image descriptor would lead to bad results on the other tasks.

As a consequence adding additional (auxiliary) tasks to the neural network can improve the accuracy on the initial task, even if you are not interested in predicting these additional tasks.

So essentially, if there is a common description of your data, that can be used to solve your required tasks, the system may benefit by using one model with multiple outputs.

You may have a look into the literature, e.g. collaborative learning, multi-task learning, and auxiliary tasks.

I hope this answers your question.

Correct answer by Graph4Me Consultant on March 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP