Is Flatten() layer in keras necessary?

Question

In CNN transfer learning, after applying convolution and pooling,is Flatten() layer necessary?
I have seen an example where after removing top layer of a vgg16 ,first applied layer was GlobalAveragePooling2D() and then Dense().
Is this specific to transfer learning?
This is the example without Flatten().
base_model=MobileNet(weights='imagenet',include_top=False) #imports the mobilenet model and discards the last 1000 neuron layer.

x=base_model.output
x=GlobalAveragePooling2D()(x)
x=Dense(1024,activation='relu')(x) #we add dense layers so that the model can learn more complex functions and classify for better results.
x=Dense(1024,activation='relu')(x) #dense layer 2
x=Dense(512,activation='relu')(x) #dense layer 3
preds=Dense(3,activation='softmax')(x) #final layer with softmax activation

This example is with Flatten().
vgg = VGG16(input_shape=IMAGE_SIZE + [3], weights='imagenet', include_top=False)

# don't train existing weights
for layer in vgg.layers:
  layer.trainable = False
  
  # useful for getting number of classes
folders = glob('Datasets/Train/*')
  
# our layers - you can add more if you want
x = Flatten()(vgg.output)
# x = Dense(1000, activation='relu')(x)
prediction = Dense(len(folders), activation='softmax')(x)

# create a model object
model = Model(inputs=vgg.input, outputs=prediction)

What is the difference if both can be applied?

10xAI · Accepted Answer

Although the first answer has explained the difference, I will add a few other points.
If the model is very deep(i.e. a lot of Pooling) then the map size will become very small e.g. from 300x300 to 5x5. Then it is more likely that the information is dispersed across different Feature maps and the different elements of one feature map don't hold much information.
So you are reducing the dimension which will eventually reduce the number of parameters when joined with the Dense layer.

Excerpt from Hands-On Machine Learning by Aurélien Géron
the global average pooling layer outputs the mean of each feature map: this drops any remaining spatial information, which is fine because there was not much spatial information left at that point. Indeed, GoogLeNet input images are typically expected to be 224 × 224 pixels, so after 5 max pooling layers, each dividing the height and width by 2, the feature maps are down to 7 × 7. Moreover, it is a classification task, not localization, so it does not matter where the object is. Thanks to the dimensionality reduction brought by this layer, there is no need to have several fully connected layers at the top of the CNN (like in AlexNet), and this considerably reduces the number of parameters in the network and limits the risk of overfitting.

Is this specific to transfer learning?

You can apply this concept to your own model too and test the result/parm count for different cases i.e. small/large Model.
Anyway, Transfer learning is just a special case of Neural Network i.e. you are continuing to use an already trained model.
Adding a depiction for the difference in approach

ashraful16 · Answer

With GlobalAveragePooling2D, only one Feature per Feature map is selected by averaging every elements of the Feature Map.
e.g. if your global average pooling layer input is 220 x 220 x 30 you will find 1x1x30 output.
It means that you are finding a global representative feature from every slice. That is Global Average Pooling.
No, this isn't specific to transfer learning. It is used over feature maps in the classification layer, that is easier to interpret and less prone to overfitting than a normal fully connected layer.
On the other hand,
Flattening is simply converting a multi-dimensional feature map to a single dimension without any kinds of feature selection.

Is Flatten() layer in keras necessary?

2 Answers

Add your own answers!

Ask a Question