How multi-scale CNN selects final output map

Question

I read a few days ago about multi-scale CNN (OverFeat method) which you can access to presentation via this link. They performed CNN on different scales of an image and then combine all output maps. They said inside of that presentation:

Classification    performed   at  6   scales  at  test time,  but only    1 scale at  run time    .

So my question is: If we use 6 different scales of CNN architecture, then we have different convolution layers in every scale (I guess so). So how in OverFeat, they use 1 scale in run time? If we use a specific scale, then how can we access other feature extractors of different scales? And I see in the article, they combine feature maps of different scales but I can't figure out how this process performed.

Vivek Khetan · Answer

Think of this as varied filter size and varied filter values. It will extract different representation (or say capture different part of the image), and then stack them to get a bigger feature vector. Then, you do the featurisation .Also, check for dilated CNNs used for NLP. They are based on somewhat similar concepts.

Answered by Vivek Khetan on January 31, 2021

How multi-scale CNN selects final output map

One Answer

Add your own answers!

Ask a Question