Data Science Asked on August 3, 2021
In pytorch, we use:
nn.conv2d(input_channel, output_channel, kernel_size)
in order to define the convolutional layers.
I understand that if the input is an image which has size $text{width} times text{height} times 3$ we would set the input_channel = 3
. I am confused, however, what if I have a data set that has dimension: $3 times 3 times 30$ or $30 times 4 times 5$?
Which number should I use to define the input_channel
for these?
Thanks in advance.
The defining factor is which dimensions you want your 2-dimensional convolution sweep over, e.g.:
In images, you want the 2D convolution to sweep over the height and width dimensions, and the extra dimension (the color space) is the channels; for grayscale images, you have a single channel.
In a spectrogram, you want the 2D convolution to sweep over the time and frequency dimensions. As there are no further dimensions, there is only one channel, like with grayscale images.
In the cases you propose, e.g. "3 * 3 * 30", if we want the 2D convolution to happen in the two first dimensions, then the number of input channels would be 30. If we wanted the 2D convolution to sweep over two other dimensions, then the remaining one would be the number of input channels. The same for "30 * 4 * 5".
We should note, however, that 2D convolutions follow a strict convention in the ordering of dimensions. As described in the pytorch documentation, the convention is $(N,C_{in},H,W)$, which means that we should rearrange the dimensions in our input tensor (e.g. with torch.Tensor.permute
) to ensure that the dimensions over which we want the 2D convolution to sweep are in the correct order (i.e. the last 2 dimensions).
Correct answer by noe on August 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP