TransWikia.com

Do convolutions "flatten images"?

Data Science Asked by David Ruiz on July 9, 2021

I’m looking for a good explanation of how convolutions in deep learning work when applied to multi-channel images. For example, let’s say I have a 100 x 100 pixel image with three channels, RGB. The input tensor would then have dimensions 100 x 100 x 3.

If I apply a convolution with N filters and a stride of one, will the output dimension be:

100 x 100 x 3 x N ?

or

100 x 100 x N ?

In other words, does the convolution that is applied “flatten” the image, or is the convolution applied on a channel by channel basis?

4 Answers

In all the implementations for CNNs processing images that I have seen, the output in any layer is

Width x Height x Channels

or some permutation. This is the same number of dimensions as the input, no additional dimensions are added by the convolutional layers. Each feature map channel in the output of a CNN layer is a "flattened" 2D array created by adding the results of multiple 2D kernels (one for each channel in the input layer).

Usually even greyscale input images are expected to be represented as Width x Height x 1 so that they fit the same pattern and the same layer model can be used.

It is entirely feasible to build a layer design which converts a standard 2D+channels input layer into a 3D+channels layer. It is not something I have seen done before, but you can never rule out that it could be useful in a specific problem.

You may also see 3D+channels convolutions in CNNs applied to video, but in that case, the structure will be some variation of

Width x Height x Frames x Channels

Correct answer by Neil Slater on July 9, 2021

It depends on the number of filters you choose. say you have chosen 64 filters.

your weight tensor will be of shape [3, 3, 3, 64] (3rd 3 is number of channels in the input layer and 64 is the number of channels in the output layer ) and bias tensor shape [64]

output will be if Pad = "SAME" and stride 1 for an input image of 224 * 224 * 3 = 224 * 224 * 64

output will be if Pad = "Valid" and stride 1 for an input image of 224 * 224 * 3 = 221 * 221 * 64

Now with an input of 221 * 221 * 64 if you want to create 128 filters in the next layer your - weight vector shape will be [3, 3, 64, 128] and - bias vecotr shape will be [128]

and output shape - if Pad = "SAME" [ 221 , 221 , 128 ] - if Pad = "VALID" [ 198 , 198 , 128 ] considering stride = [1,1,1,1]

you can check these results while building the graph using layername.get_shape().

Answered by Prakash Vanapalli on July 9, 2021

The output dimension of a convolution in deep learning depends on multiple factors

  1. the size of the filter (aka kernel)
  2. the padding (whether you add zeros or not around your image and how many)
  3. the numbers of filter that you use
  4. the stride

The simplest dependency is that on the numbers of filters N. It gives you the numbers of feature maps that your output has. For the input that may be the RGB channels i.e. 3, for the output this number can be chosen freely.

The next factor is the zero-padding. If you use a filter size of (3,3) and "valid" padding i.e. adding NO zeros around the image you end up with an output of dimension.

(100, 100, 3) -> (98, 98, N)

Because you use a stride of 1. If you move the filter across the image at the end of the picture in each direction the filter will hit the border after 98 steps.

However, if you use "SAME" padding you compensate for the filter size -in case of a filter size of (3,3) that would correspond to one line of zeros around the image- you will end up with:

(100, 100, 3) -> (100, 100, N)

With a stride of 2 for example you shift the position of the filter by two pixels. Therefore, you get

(100, 100, 3) -> (50, 50, N)

Answered by Sören on July 9, 2021

The three channels RGB are convolved by different kernels and added in each feature map. So, you will have 100 x 100 x N as output first layer.

Answered by Jessé Andrade on July 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP