Cross Validated Asked on November 14, 2021
I have a regression task, for which I’m training a model with MSE loss. So for label $y$ and estimation $hat{y}$ the loss is
$$ell(y,hat{y})=(y-hat{y})^2$$
However, there is an uncertainty in the “true” labels, which varies across labels. So each true label is drawn from a distribution for which I can obtain a reasonable estimate for any statistic e.g. the standard deviation.
I’d like the loss to reflect the variation in the true label $y$. I thought about simply normalizing by the standard deviation of each label
$$ellleft(y,hat{y}right)=left(frac{y-hat{y}}{sigmaleft(yright)}right)^{2}$$
Or, since sometimes $sigma(y)=0$, maybe
$$ellleft(y,hat{y}right)=left(frac{y-hat{y}}{1+sigmaleft(yright)}right)^{2}$$
But this seems too ad-hoc. Is there a standard theory or approach people use in this sort of situation?
Usual approach in statistics is to consider the errors $epsilon_i= y_i-E[y_i|x]$ homoscedastic with variance $sigma^2$. This assumption, joint with independence one, results in least squares as the loss function for estimating $E[y_i|x]$.
If your measures of $y$ are themselves variable, the variance of errors should be $sigma^2 + sigma(y_i)^2$. This results in a loss function of $sum_i w_i(y_i-hat y_i)^2$, where $w_i= (sigma^2 + sigma(y_i)^2)^{-1/2}$.
Problem is that $sigma^2$, a.k.a residual variance, is not known, and has to be estimated, and it can't be estimated afterwards the rest of the model, which needs it to properly define loss function. Solution is given by Iteratively Reweighted Least Squares. That's a quite intuitive algorithm, one simple explanation is available in section 2.3 of this document.
Answered by carlo on November 14, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP