CNN: Details of Zeiler Fergus Net

Cross Validated Asked by vrx on December 26, 2020

I want to replicate the modified AlexNet by Zeiler and Fergus from 2013 (Visualizing and Understanding Convolutional Networks) but they spare some details. Hope someone here knows more about it.

  1. What is their exact learning rate schedule? They just write “We
    anneal the learning rate throughout training manually when the
    validation error plateaus”.

  2. Do they use weight decay?

  3. In which layers do they “renormalize” the filters (they do not
    divide the input by the global standard deviation)?

  4. I do not understand their architecture completely: In the first
    layer: 224 -> 110 with filters of width/height 7 and stride 2. Do
    they add a padding of one only on one side because 110*2+5=225 or am
    I wrong? Same for 3×3 maxpooling 26 -> 13 with stride 2.

One Answer

A partial answer:

1-) That is a type of learning procedure. As certain learning rate can't reduce objective function further, learning rate is reduced and training continues. This behaviour is similar to over-shooting. After some time, the learning rate may become too big to reduce error rate. So it is reduced in some degree. The simplist one is to divide the learning rate by a constant, 5,10 i.e.

2-) I think they did, because AlexNet has used it. Most of their settings are taken from AlexNet.


4-) During pooling, padding may used to complete non-overlapping regions of input space and pooling region. For example, 3x3 pooling with 2 strides on 26x26 input region should be padded with 1 from single side.

Answered by yasin.yazici on December 26, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP