How to interpret sudden jumps of improvement in training error vs. iteration?

Data Science Asked by Alexander Engelhardt on June 25, 2021

In the Residual learning paper by He et al., there are a number of plots of training/test error vs. backprop iteration. I’ve only ever seen “smooth” curves on these plots, while in this paper’s application, there are sudden jumps in improvement. In this figure (Fig. 4a in the paper linked above), they are at around 15e4 and 30e4 iterations:

What happened here? My intuition would say that the backpropagation hit a plateau with a gradient of close to zero, and then very suddenly found a path steeply downward – but is that a realistic shape for a cost function?

backpropagation convolutional neural network gradient descent neural network

Add your own answers!

Ask a Question

Get help from others!