Determining size of FC layer after Conv layer in PyTorch

Question

I am learning PyTorch and CNNs but am confused how the number
of inputs to the first FC layer after a Conv2D layer is calculated.

My network architecture is shown below, here is my reasoning using
the calculation as explained here.

The input images will have shape (1 x 28 x 28).

The first Conv layer has stride 1, padding 0, depth 6
and we use a (4 x 4) kernel. The output will thus be (6 x 24 x 24),
because the new volume is (28 - 4 + 2*0)/1.

Then we pool this with a (2 x 2) kernel and stride 2 so we get an output of (6 x 11 x 11),
because the new volume is (24 - 2)/2.

Same thing for the second Conv and pool layers, but this time with a (3 x 3) kernel in the Conv layer, resulting in (16 x 3 x 3) feature maps in the end.

My assumption would then be that the first linear layer should have 144 inputs (16 * 3 * 3),
but when I calculate the inputs programatically, I get 400. What did I miss?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 4)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(400, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, len(classes))

def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features # 400, not 144

Related but less so: is there a reasoning used by people to get a good kernel size, number of layers
and number of pool layers or does everyone just look at what the SOTA papers do?

RaptorDotCpp · Accepted Answer

Hello and welcome to Stack Exchange!

The answer to your question is quite simple: you did not use the correct formula.

The formula you used is (assuming we are working with square inputs)

$$
W'=frac{W-F+2P}{S}
$$

but the correct formula is

$$
W'=frac{W-F+2P}{S}+1
$$

Now if we redo your calculations starting with $(1 times 28 times 28)$ inputs:

$$
W^{(1)}=28-4+1=25
W^{(2)}=lfloorfrac{25-2}{2}+1rfloor=12
W^{(3)}=12-3+1=10
W^{(4)}=lfloorfrac{10-2}{2}+1rfloor=5
$$

Considering that the second convolution layer has 16 output channels (or feature maps), you can indeed then calculate the number of inputs as $16cdot5^2=400$.

Michael Gardner · Answer

You can use torch.nn.AdaptiveMaxPool2d to set a specific output.

For example, if I set nn.AdaptiveMaxPool2d((5,7)) I am forcing the image to be a 5X7.  Then you can just multiply that by out_channels from your previous Conv2d layer.

https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveMaxPool2d

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 4)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.adapt = nn.AdaptiveMaxPool2d((5,7))
        self.fc1 = nn.Linear(16*5*7, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, len(classes))

def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = self.adapt(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*7)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Anil Bora Yayak · Answer

I added a method to Pytorch model for determining the input linear layer neuron size automatically, hopefully it will be helpful for anyone struggling with calculations.

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
                               #color channel, # of conv layers
        self.conv1 = nn.Conv2d(in_channels= 1, out_channels= 32, kernel_size= 3)
        self.maxpool = nn.MaxPool2d(kernel_size= 2, stride= 2)
        self.conv2 = nn.Conv2d(32, 64, 5)
        self.neurons = self.linear_input_neurons()

self.fc1 = nn.Linear(self.linear_input_neurons(), 1000)
        self.fc2 = nn.Linear(1000, 500)
        self.fc3 = nn.Linear(500, classes)

def forward(self, x):
        x = self.maxpool(F.relu(self.conv1(x.float())))
        x = self.maxpool(F.relu(self.conv2(x.float())))
        x = x.view(-1, self.neurons)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

return x

# here we apply convolution operations before linear layer, and it returns the 4-dimensional size tensor. 
    def size_after_relu(self, x):
        x = self.maxpool(F.relu(self.conv1(x.float())))
        x = self.maxpool(F.relu(self.conv2(x.float())))

return x.size()

# after obtaining the size in above method, we call it and multiply all elements of the returned size.
    def linear_input_neurons(self):
        size = self.size_after_relu(torch.rand(1, 1, 64, 32)) # image size: 64x32
        m = 1
        for i in size:
            m *= i

return int(m)

komunistbakkal · Answer

If you are willing to give additional input parameters to the CNN, you can calculate it automatically. 
Input dim for MNIST is input_dim=(1,28,28). So that, I can calculate it like this:

import torch
from torch import nn

import functools
import operator

class CNN(nn.Module):
    """Basic Pytorch CNN implementation"""

def __init__(self, in_channels, out_channels, input_dim):
        nn.Module.__init__(self)
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=20, kernel_size=3, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),

nn.Conv2d(in_channels=20, out_channels=50, kernel_size=3, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
        )

num_features_before_fcnn = functools.reduce(operator.mul, list(self.feature_extractor(torch.rand(1, *input_dim)).shape))

self.classifier = nn.Sequential(
            nn.Linear(in_features=num_features_before_fcnn, out_features=100),
            nn.Linear(in_features=100, out_features=out_channels),
        )

def forward(self, x):
        batch_size = x.size(0)

out = self.feature_extractor(x)
        out = out.view(batch_size, -1)  # flatten the vector
        out = self.classifier(out)
        return out

Determining size of FC layer after Conv layer in PyTorch

4 Answers

Add your own answers!

Ask a Question