Cross Validated Asked on November 9, 2021
It seems to be common knowledge in computer vision that in a convolution neural network, layers closer to the image has higher resolution but lower semantic value, and vice visa for layers further to the image. I do not quite understand where this is coming from. Any references or explanations on this will be much appreciated.
A convolutional neural network with nonlinear activation functions performs nonlinear image processing. Let an $X times Y$ 2D image be defined as $I(x,y)$ and a convolutional neural network as $NN(x,y)$. Convolution in two dimensions can be written as
$ I(x,y) circledast NN(x,y) $
The first layer of hidden nodes learns to represent convolution kernels that are predictive with respect to the desired outcome of the neural network. These neural networks by themselves learn to recognize different geometric features in the image training set
and a list of even more operators.
When the first hidden layer of your neural network is not initialized with such operators (some use this approach), geometric feature operators appear to emerge during the training process. These features are from a semantic viewpoint low-level. Subsequent hidden layers are needed in order to combine such geometric features into recognized objects such as particular faces or hand-written digits for that sake.
It is a natural process of image processing that the available pixels are combined into features, of which the combined presence and absence is associated with the recognition of a particular object. See the downloadable article: Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, Vol. 521, pp. 436-444, 2015.
The human brain is now also known to perform low-level to high-level image processing. Much literature on this subject is readily available also.
Answered by Match Maker EE on November 9, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP