Cross Validated Asked by sgg on January 16, 2021
My question is about establishing an inequality between population error and expected training error (i.e, expected training error < population error) for a model trained with gradient descent on a specific loss (not necessarily until convergence).
Assume we have training data $(X,Y) = {(x_1,y_1), ldots, (x_n,y_n)}$ sampled i.i.d. from an unknown distribution $P$. Say we do ERM with a loss function $ell$ and obtain $hat f = argmin_{finmathcal{F}} frac{1}{n}sum_i ell(f(x_i),y_i)$. With $f^*$, we denote the true minimizer of the population loss. From basic ERM inequality, we have
$$frac{1}{n}sum_i ell(hat f(x_i),y_i) le frac{1}{n}sum_i ell(f^*(x_i),y_i),.$$
Taking expectation on both sides, we have
$$mathbb{E}left[frac{1}{n}sum_i ell(hat f(x_i),y_i)right] le mathbb{E}left[frac{1}{n}sum_i ell(f^*(x_i),y_i)right],.$$
Elaborating, my question is about obtaining a similar inequality with a model $tilde f_t$ obtained after taking $t$ gradient steps to minimize the same loss, i.e., $frac{1}{n}sum_i ell(f(x_i),y_i)$ with a learning rate $eta$.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP