TransWikia.com

How is the output of a maxpool layer window size=1x2 and stride=2 calculated?

Data Science Asked by John T. Copeland on August 30, 2021

I’m looking at the architecture proposed in the following paper: Baoguang Shi et al, An End-to-End Trainable Neural Network for Image-based Sequence
Recognition and Its Application to Scene Text Recognition
.

In the proposed architecture of the model, a MaxPooling Window:1 × 2, s:2 layer is mentioned. I’m not sure what the size of the output of this layer would be.
Architecture of model

If i have an input of size (32 x 8), then the output would be:

(32-1)/2 + 1 = 16.5, <- this part doesn’t make sense to me

(8-2)/2 + 1 = 4

*ignoring depth and batch size here

One Answer

According to the paper, maybe "s" represents stride in row, while the stride in column equals 1.

Answered by Fortune Seeker on August 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP