What is the difference between 8 filters twice and one 16 filters in convolutional layers at CNN?

Question

Why would we use two convolution layers in a row with same spec? E.g. in VGG16 first two layers, they used 3 x 3 filter with 64 depth twice. What is the difference if we use 3 x 3 filter with 128 depth once.

cnn convolution convolutional neural network

Neil Slater · Accepted Answer

Each layer has a limited amount that it can transform the layer below it. There is one linear component (weighted sum of output of layer beneath it), and one non-linear component (typically ReLU).

It is in theory possible to approximate any function with a large enough single layer in a fully-connected network. However, a stack of similar smaller layers is more expressive using less resources. That means for the same number of parameters you have access to a more flexible function approximator. At some level of complexity for your target function, the cost (in terms of CPU time, data required and effort in training) of making a single layer wider is higher than the cost of stacking more, similar layers.

In addition, for a CNN, you have to worry about receptive field. Any feature map can only express values that the filter can "see" due to width of the kernel. As you add more layers, each kernel applied extends the width and height of the base image that the features in the last layer effectively calculate over. If you also have a fully-connected layer after the convolutional layer, then you can in theory compensate for a poor receptive field with a very large fully-connected layer - but then you are back to the first problem of wide network with more parameters than strictly necessary to learn the function.

What is the difference between 8 filters twice and one 16 filters in convolutional layers at CNN?

One Answer

Add your own answers!

Ask a Question