Data Science Asked on December 26, 2021
I’m looking at the architecture proposed in the following paper: Baoguang Shi et al, An End-to-End Trainable Neural Network for Image-based Sequence
Recognition and Its Application to Scene Text Recognition.
In the proposed architecture of the model, a MaxPooling Window:1 × 2, s:2 layer is mentioned. I’m not sure what the size of the output of this layer would be.
If i have an input of size (32 x 8), then the output would be:
(32-1)/2 + 1 = 16.5, <- this part doesn’t make sense to me
(8-2)/2 + 1 = 4
*ignoring depth and batch size here
According to the paper, maybe "s" represents stride in row, while the stride in column equals 1.
Answered by Fortune Seeker on December 26, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP