When does Adam update its weights?

Question

I have a dataset with at least 70% of labels incorrect.
I'd expect that incorrect labels would compensate each other while true labels will be taught properly (given a very large dataset).
For example, if I have 300 samples saying a => -1 and 300 samples saying a => 1, the result for the input "a" eventually will be 0 (for a regression problem).
If I use Adam for the example above, won't it affect the results for the inputs with noisy labels due to its adaptive nature? Won't it be better to use SGD instead and decay the learning rate, or does Adam change its weights only at the end of every epoch?

Djib2011 · Answer

Adam works in the same way as SGD does in this regard, it updates the weights at the end of each iteration, so at the end of an epoch multiple weight updates have been applied.
Inherently neither Adam nor SGD do anything to counteract the noisy labels, they just try to find the best parameters that minimize a loss function. I don't think anyone can answer apriori if it will be better to use Adam or SGD for your problem.

When does Adam update its weights?

One Answer

Add your own answers!

Ask a Question