TransWikia.com

Math behind 2D convolution for RGB images

Data Science Asked on August 16, 2021

I read many threads discussing why 2D convolutional layer is typically used for RGB images in neural network.
I read that it is possible to use 3D conv layer.

What I do not understand is the math behind it.

Say your image is 300 by 300, and the kernel_size = (3, 3) and filter = 16 for the Conv2D layer. Input_shape would be (300, 300, 3) because there are 3 channels(RGB).

  1. Since the kernel is 2D, the convolution can only be done at 1 channel at a time. Is that correct?
  2. Are the same kernel applied/convolved for the 3 channels? If so there should be 3 output but the dimension of the output would be (298, 298, 16). Is it averaged over the 3 channels?

2 Answers

If your image is 3D then your kernel should be 3D too. Of course, you can also apply the 2D in which the same filter will be applied to all channels.

Image Source (Content is also well). enter image description here

However, normally you apply a 3D filter to a 3D image. So if you apply 16 filters of size 3x3x3 to an image of size 6x6x6, then you will get 16 outputs of size 4x4. If you would apply 16 filters of size 3x3 filters, you would get 16 outputs of size 4x4x3. It would be treating each channel separately. But when you use a 3D filter, your output of convolution operation depends on all three dimensions. In other words, you multiply your 27 points from your 3x3x3 filter with the corresponding 27 points (3x3 pixels and their 3 channels) from the image, and then add them to get the result. Thus, 1 more dimension would be there for you to handle (16x4x4x3 instead of 16x4x4).

The answer to your question 1 is Yes, you would apply the filter 1 channel at a time.

Check the link for a very good explanation by Andrew NG.

Correct answer by Shahriyar Mammadli on August 16, 2021

Actually each filter is a collection of kernels, with there being one kernel for every single input channel to the layer.

Each filter in a convolution layer produces one and only one output channel

Kernels of the filter slides over their respective input channels and the processed versions are then summed together to form one channel. The kernels of a filter each produce one version of each channel, and the filter as a whole produces one overall output channel.

Refer to know more.

Answered by prashant0598 on August 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP