How is the output of a maxpool layer window size=1x2 and stride=2 calculated?

Question

I'm looking at the architecture proposed in the following paper: Baoguang Shi et al, An End-to-End Trainable Neural Network for Image-based Sequence
Recognition and Its Application to Scene Text Recognition.

In the proposed architecture of the model, a MaxPooling Window:1 × 2, s:2 layer is mentioned. I'm not sure what the size of the output of this layer would be.

If i have an input of size (32 x 8), then the output would be:

(32-1)/2 + 1 = 16.5, <- this part doesn't make sense to me

(8-2)/2 + 1 = 4

*ignoring depth and batch size here

Fortune Seeker · Answer

According to the paper, maybe "s" represents stride in row, while the stride in column equals 1.

How is the output of a maxpool layer window size=1x2 and stride=2 calculated?

One Answer

Add your own answers!

Ask a Question