Data Science Asked by Hermon Jay on May 2, 2021
Why would we use two convolution layers in a row with same spec? E.g. in VGG16 first two layers, they used 3 x 3 filter with 64 depth twice. What is the difference if we use 3 x 3 filter with 128 depth once.
Each layer has a limited amount that it can transform the layer below it. There is one linear component (weighted sum of output of layer beneath it), and one non-linear component (typically ReLU).
It is in theory possible to approximate any function with a large enough single layer in a fully-connected network. However, a stack of similar smaller layers is more expressive using less resources. That means for the same number of parameters you have access to a more flexible function approximator. At some level of complexity for your target function, the cost (in terms of CPU time, data required and effort in training) of making a single layer wider is higher than the cost of stacking more, similar layers.
In addition, for a CNN, you have to worry about receptive field. Any feature map can only express values that the filter can "see" due to width of the kernel. As you add more layers, each kernel applied extends the width and height of the base image that the features in the last layer effectively calculate over. If you also have a fully-connected layer after the convolutional layer, then you can in theory compensate for a poor receptive field with a very large fully-connected layer - but then you are back to the first problem of wide network with more parameters than strictly necessary to learn the function.
Correct answer by Neil Slater on May 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP