Data Science Asked by Into Jo on May 21, 2021
I am learning PyTorch and CNNs but am confused how the number
of inputs to the first FC layer after a Conv2D layer is calculated.
My network architecture is shown below, here is my reasoning using
the calculation as explained here.
The input images will have shape (1 x 28 x 28).
The first Conv layer has stride 1, padding 0, depth 6
and we use a (4 x 4) kernel. The output will thus be (6 x 24 x 24),
because the new volume is (28 – 4 + 2*0)/1.
Then we pool this with a (2 x 2) kernel and stride 2 so we get an output of (6 x 11 x 11),
because the new volume is (24 – 2)/2.
Same thing for the second Conv and pool layers, but this time with a (3 x 3) kernel in the Conv layer, resulting in (16 x 3 x 3) feature maps in the end.
My assumption would then be that the first linear layer should have 144 inputs (16 * 3 * 3),
but when I calculate the inputs programatically, I get 400. What did I miss?
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 4)
self.conv2 = nn.Conv2d(6, 16, 3)
self.fc1 = nn.Linear(400, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, len(classes))
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_features # 400, not 144
Related but less so: is there a reasoning used by people to get a good kernel size, number of layers
and number of pool layers or does everyone just look at what the SOTA papers do?
Hello and welcome to Stack Exchange!
The answer to your question is quite simple: you did not use the correct formula.
The formula you used is (assuming we are working with square inputs)
$$ W'=frac{W-F+2P}{S} $$
but the correct formula is
$$ W'=frac{W-F+2P}{S}+1 $$
Now if we redo your calculations starting with $(1 times 28 times 28)$ inputs:
$$ W^{(1)}=28-4+1=25 W^{(2)}=lfloorfrac{25-2}{2}+1rfloor=12 W^{(3)}=12-3+1=10 W^{(4)}=lfloorfrac{10-2}{2}+1rfloor=5 $$
Considering that the second convolution layer has 16 output channels (or feature maps), you can indeed then calculate the number of inputs as $16cdot5^2=400$.
Correct answer by RaptorDotCpp on May 21, 2021
You can use torch.nn.AdaptiveMaxPool2d to set a specific output.
For example, if I set nn.AdaptiveMaxPool2d((5,7)) I am forcing the image to be a 5X7. Then you can just multiply that by out_channels from your previous Conv2d layer.
https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveMaxPool2d
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 4)
self.conv2 = nn.Conv2d(6, 16, 3)
self.adapt = nn.AdaptiveMaxPool2d((5,7))
self.fc1 = nn.Linear(16*5*7, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, len(classes))
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = self.adapt(F.relu(self.conv2(x)))
x = x.view(-1, 16*5*7)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Answered by Michael Gardner on May 21, 2021
I added a method to Pytorch model for determining the input linear layer neuron size automatically, hopefully it will be helpful for anyone struggling with calculations.
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
#color channel, # of conv layers
self.conv1 = nn.Conv2d(in_channels= 1, out_channels= 32, kernel_size= 3)
self.maxpool = nn.MaxPool2d(kernel_size= 2, stride= 2)
self.conv2 = nn.Conv2d(32, 64, 5)
self.neurons = self.linear_input_neurons()
self.fc1 = nn.Linear(self.linear_input_neurons(), 1000)
self.fc2 = nn.Linear(1000, 500)
self.fc3 = nn.Linear(500, classes)
def forward(self, x):
x = self.maxpool(F.relu(self.conv1(x.float())))
x = self.maxpool(F.relu(self.conv2(x.float())))
x = x.view(-1, self.neurons)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
# here we apply convolution operations before linear layer, and it returns the 4-dimensional size tensor.
def size_after_relu(self, x):
x = self.maxpool(F.relu(self.conv1(x.float())))
x = self.maxpool(F.relu(self.conv2(x.float())))
return x.size()
# after obtaining the size in above method, we call it and multiply all elements of the returned size.
def linear_input_neurons(self):
size = self.size_after_relu(torch.rand(1, 1, 64, 32)) # image size: 64x32
m = 1
for i in size:
m *= i
return int(m)
Answered by Anil Bora Yayak on May 21, 2021
If you are willing to give additional input parameters to the CNN, you can calculate it automatically.
Input dim for MNIST is input_dim=(1,28,28)
. So that, I can calculate it like this:
import torch
from torch import nn
import functools
import operator
class CNN(nn.Module):
"""Basic Pytorch CNN implementation"""
def __init__(self, in_channels, out_channels, input_dim):
nn.Module.__init__(self)
self.feature_extractor = nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=20, kernel_size=3, stride=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(in_channels=20, out_channels=50, kernel_size=3, stride=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2),
)
num_features_before_fcnn = functools.reduce(operator.mul, list(self.feature_extractor(torch.rand(1, *input_dim)).shape))
self.classifier = nn.Sequential(
nn.Linear(in_features=num_features_before_fcnn, out_features=100),
nn.Linear(in_features=100, out_features=out_channels),
)
def forward(self, x):
batch_size = x.size(0)
out = self.feature_extractor(x)
out = out.view(batch_size, -1) # flatten the vector
out = self.classifier(out)
return out
Answered by komunistbakkal on May 21, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP