CNN memory consumption

Question

I'd like to be able to estimate whether a proposed model is small enough to be trained on a GPU with a given amount of memory

If I have a simple CNN architecture like this:

Input: 50x50x3
C1: 32 3x3 kernels, with padding (I guess in reality theyre actually 3x3x3 given the input depth?)
P1: 2x2 with stride 2
C2: 64 3x3 kernels, with padding
P2: 2x2 with stride 2
FC: 500 neurons
Output: softmax 10 classes
Mini batch size of 64

Assuming 32bit floating point values, how do you calculate the memory cost of each layer of the network during training? and then the total memory required to train such a model?

Alexandru Burlacu · Answer

Maybe this link will give you an explanation on how to compute the memory usage of an arbitrary neural network. Bellow in the link is explained the memory usage of the VGGNet model. Click here and scroll down a bit))

Answered by Alexandru Burlacu on August 17, 2021

StatsSorceress · Answer

I will assume by C1, C2, etc, you mean convolutional layers, and by P1 ,P2 you mean pooling layers, and FC means fully connected layers.
We can calculate the memory required for a forward pass like this:
One image
If you're working with float32 values, then following the link provided above by @Alexandru Burlacu you have:
Input: 50x50x3 = 7,500 = 7.5K
C1: 50x50x32 = 80,000 = 80K
P1: 25x25x32 = 20,000 = 20K
C2: 25x25x64 = 40,000 = 40K
P2: 12x12x64 = 9,216 = 9.2K <- This is a problem (and my approximation is a very hand-wavy guess here). Instead of working with 50, 25, '12.5', it would make more sense to work with multiples of 32. I've heard working with multiples of 32 is also more efficient from a memory standpoint. The reason this is a bad idea is 2x2 pooling doesn't divide the space properly, as far as I can tell. Feel free to correct me if I'm wrong.
FC: 1x500 = 500 = 0.5K
Output: 1 x 10 = 10 = 0.01K (next to nothing)
Total memory: 7.5K + 80K + 20K + 40K + 0.5K = 157.2K * 4 bytes = 628.8 KB
That's for one image.
Minibatch
If you're working with a minibatch size of 64, then you're reading 64 of these into memory at once and performing the operations all together, scaling everything up like this:
Input: 64x50x50x3 = 480,000 = 480K = 0.48M
C1: 64x50x50x32 = 5,120,000 = 5.12M
P1: 64x25x25x32 = 1,280,000 = 1.28M
C2: 64x25x25x64 = 2,560,000 = 2.56M
P2: 64x12x12x64 = 589,824 = 590K = 0.59M
FC: 64x500 = 32,000 = 32K = 0.032M
Output: 1x10x64 = 640 = 0.64K = 0.00064M (we don't care, this is tiny)
Total memory: 10M x 4 bytes ~ 40MB (I'm saying approximate because the website also says an approximate value)
EDIT: I misread the website, sorry.
According to the website, a backward pass requires about triple this, because of the need to store:

the activations and associated gradients for each neuron - these are of equal size;

the gradients of the weights (parameters) which are the same size as the parameters;

the value of the momentum, if you're using it;

some kind of miscellaneous memory (I don't understand this part)

Vijendra1125 · Answer

While training a convNet, total memory required include following:

Memory for parameters
Memory for the output of intermediate layers
Memory for the gradient of each parameter
Extra memory needed if you are using optimizer like Momentum, RMSprop, Adams etc
Miscellaneous memory for implementation

A good rough approximation is number of parameters x 3 x 4(if you are using 32-bit float) bytes

Well, now this is how you calculate the number of parameters:

Conv layer: (kernel width x kernel height) x number of channels x depth + depth (add depth only if bias is there)
FC layer: numb of input*numb of output + output (output is added to include the number of bias)
Max pool layer: no parameter

Now just sum the number of all the parameters and use the formula I mentioned.

CNN memory consumption

3 Answers

Add your own answers!

Ask a Question