Data Science Asked on August 17, 2021
I’d like to be able to estimate whether a proposed model is small enough to be trained on a GPU with a given amount of memory
If I have a simple CNN architecture like this:
Input
: 50x50x3C1
: 32 3×3 kernels, with padding (I guess in reality theyre actually 3x3x3 given the input depth?)P1
: 2×2 with stride 2C2
: 64 3×3 kernels, with paddingP2
: 2×2 with stride 2FC
: 500 neuronsOutput
: softmax 10 classesAssuming 32bit floating point values, how do you calculate the memory cost of each layer of the network during training? and then the total memory required to train such a model?
Maybe this link will give you an explanation on how to compute the memory usage of an arbitrary neural network. Bellow in the link is explained the memory usage of the VGGNet model. Click here and scroll down a bit))
Answered by Alexandru Burlacu on August 17, 2021
I will assume by C1
, C2
, etc, you mean convolutional layers, and by P1
,P2
you mean pooling layers, and FC
means fully connected layers.
We can calculate the memory required for a forward pass like this:
One image
If you're working with float32 values, then following the link provided above by @Alexandru Burlacu you have:
Input
: 50x50x3 = 7,500 = 7.5K
C1
: 50x50x32 = 80,000 = 80K
P1
: 25x25x32 = 20,000 = 20K
C2
: 25x25x64 = 40,000 = 40K
P2
: 12x12x64 = 9,216 = 9.2K <- This is a problem (and my approximation is a very hand-wavy guess here). Instead of working with 50, 25, '12.5', it would make more sense to work with multiples of 32. I've heard working with multiples of 32 is also more efficient from a memory standpoint. The reason this is a bad idea is 2x2 pooling doesn't divide the space properly, as far as I can tell. Feel free to correct me if I'm wrong.
FC
: 1x500 = 500 = 0.5K
Output
: 1 x 10 = 10 = 0.01K (next to nothing)
Total memory: 7.5K + 80K + 20K + 40K + 0.5K = 157.2K * 4 bytes = 628.8 KB
That's for one image.
Minibatch
If you're working with a minibatch size of 64, then you're reading 64 of these into memory at once and performing the operations all together, scaling everything up like this:
Input
: 64x50x50x3 = 480,000 = 480K = 0.48M
C1
: 64x50x50x32 = 5,120,000 = 5.12M
P1
: 64x25x25x32 = 1,280,000 = 1.28M
C2
: 64x25x25x64 = 2,560,000 = 2.56M
P2
: 64x12x12x64 = 589,824 = 590K = 0.59M
FC
: 64x500 = 32,000 = 32K = 0.032M
Output
: 1x10x64 = 640 = 0.64K = 0.00064M (we don't care, this is tiny)
Total memory: 10M x 4 bytes ~ 40MB (I'm saying approximate because the website also says an approximate value)
EDIT: I misread the website, sorry.
According to the website, a backward pass requires about triple this, because of the need to store:
the activations and associated gradients for each neuron - these are of equal size;
the gradients of the weights (parameters) which are the same size as the parameters;
the value of the momentum, if you're using it;
some kind of miscellaneous memory (I don't understand this part)
Answered by StatsSorceress on August 17, 2021
While training a convNet, total memory required include following:
A good rough approximation is number of parameters x 3 x 4(if you are using 32-bit float) bytes
Well, now this is how you calculate the number of parameters:
Now just sum the number of all the parameters and use the formula I mentioned.
Answered by Vijendra1125 on August 17, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP