What does the phrase 'underlying mapping' mean?

Question

In a paper by KaimingHe entitled Deep Residual Learning for Image Recognition, there is a phrase 'underlying layer'. What does this mean?

neuroguy123 · Answer

Functions map domains to ranges.  Neural networks learn such functions, so you can think of a neural network as a mapping of input spaces to output spaces.  Deep neural networks are stacked with many layers of course, and each of those can be viewed as sub-functions of the network with their own underlying mappings.  For example, each layer in a convolution network consists of some convolution layers + some other helper layers such as normalization and pooling.
The paper is about the discovery that residual connections in 'very' deep networks help them converge to an accuracy at least as good or better than a shallower version of the same network.
Basically, it was found that deep convolutional networks actually underperformed compared to shallower versions, but that was not due to overfitting.  Intuitively, this is confusing because you would expect to be able to construct a deeper network from a shallower version and have it at least perform as well.  How?  You take a network of say 5 convolution layers, and add 5 new ones.  If the 5 new layers learned an underlying mapping that was the Identity function (i.e. input to output in a one-to-one manner), then the networks should perform similarly.  This is not what happens in reality and the deeper network has a difficult time learning Identity mappings.
If we assume an underlying mapping function of a sublayer (one layer of convolutions), $H(x)$, that can learn a complex function, we find that it has difficulty learning mappings that help the network using SGD.  Their insight was to reformulate the underlying mapping as $H(x) = F(x) - x$. That is, some mapping that is a new function with the original input subtracted.  This naturally leads to 'wiring' the network with a residual and the underlying mapping becomes: $F(x) = H(x) + x$.  The Identity mapping is now trivial and the $H(x)$ mapping can learn to add new complexity.

What does the phrase 'underlying mapping' mean?

One Answer

Add your own answers!

Ask a Question