Data Science Asked by kfx on March 16, 2021
I have a Keras Xception based model for gesture recognition. The accuracy of the model is around 60-70% for classifying 7 different gestures. The training dataset consists of 320×240 and 640×480 pixel images. Currently, I’m leaving the input_shape
parameter of the model equal to default value of the Xception model in Keras, which is (299, 299, 3)
. I assume under the hood the network is rescaling all inputs to 299×299 pixels, which probably isn’t a good approach.
My questions are:
For your first question, yes, it is optimized for that size since the original paper for Xception used 299x299 size. But, you can use other sizes. You should resize your images to 299x299 that would be the best.
For your second question, the reason height = width because in the network, the convolutional filters which are used are square (3x3 filters). The reason for using square filter in computer vision is we assume that the features in image are symmetric most of the times (exception being text where the information is more on the vertical dimension than horizontal, 1x2 filters are used there).
For your third question, go for the smaller size, because if you rescale the smaller images to bigger ones, you add useless information into image since it is derived from the smaller image itself. Also, you create a model with more parameters.
Answered by Abhishek Verma on March 16, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP