TransWikia.com

1x1 convolutions, equivalence with fully connected layer

Data Science Asked by nixon on March 11, 2021

I’m confused by the concept of equating a 1×1 convolution with a fully connected layer. Take the following simple example of a 1×1 convolution of 2 input channels each of size 2×2, and a single output channel.

enter image description here

The only way I can relate this to fully connected layers is to say that there are 4 fully connected layers, one for each location in the input feature map (inputs and outputs colour coded).

From what I can understand my interpretation is consistent with the Network in Network paper[Lin et al. 2013] which describe the 1×1 as being equivalent as cross channel parametric pooling

The cross channel parametric pooling layer is also equivalent to a
convolution layer with 1×1 con- volution kernel.

I have seen
this one from Yann LeCunn equating 1×1 convolutions to a fully connected layer. And I have read this answer and I’m just not seeing the equivalence between a 1×1 convolution over an input volume and a single fully connected layer…

Any insight would be appreciated, if you can please relate back to the example above. Thanks!

One Answer

The interpretation that the 1d convolution given in the OP can be duplicated with four separate fully-connected layers is correct (see diagram). Also, in at least some implementations, kernel weights used during a 1x1 convolution can be made trainable the same way weights in a fully-connected layer can be made trainable. These points made, every fully-connected layer can not be mathematically duplicated by an equivalent 1x1 convolution. This is based on the definition that 1x1 convolution performs a "column-wise dot product" such that every pixel column in a multi-layer feature map is reduced to a single number (pixel). A fully-connected layer intermixes weights differently from the way weights are intermixed when performing a 1x1 convolution. In summary, fully connected layers and 1x1 convolutions each have their own use cases -- some overlap among these use cases exists; however, the two are not intended to be mathematically equivalent in a general sense.

Four separate "dense layers" equivalent to the 1x1 convolution in OP

Correct answer by Aether on March 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP