TransWikia.com

Relation between test and train error with gradient descent iterates

Cross Validated Asked by sgg on January 16, 2021

My question is about establishing an inequality between population error and expected training error (i.e, expected training error < population error) for a model trained with gradient descent on a specific loss (not necessarily until convergence).

Assume we have training data $(X,Y) = {(x_1,y_1), ldots, (x_n,y_n)}$ sampled i.i.d. from an unknown distribution $P$. Say we do ERM with a loss function $ell$ and obtain $hat f = argmin_{finmathcal{F}} frac{1}{n}sum_i ell(f(x_i),y_i)$. With $f^*$, we denote the true minimizer of the population loss. From basic ERM inequality, we have
$$frac{1}{n}sum_i ell(hat f(x_i),y_i) le frac{1}{n}sum_i ell(f^*(x_i),y_i),.$$
Taking expectation on both sides, we have
$$mathbb{E}left[frac{1}{n}sum_i ell(hat f(x_i),y_i)right] le mathbb{E}left[frac{1}{n}sum_i ell(f^*(x_i),y_i)right],.$$

Elaborating, my question is about obtaining a similar inequality with a model $tilde f_t$ obtained after taking $t$ gradient steps to minimize the same loss, i.e., $frac{1}{n}sum_i ell(f(x_i),y_i)$ with a learning rate $eta$.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP