Expected value of the residuals

Question

How would one prove that the expected value of the residuals from OLS regression is zero? I will make two cases. In the first case I treat $X_i$ as random and in the second case I treat it is non-random.

First case. We know that $hat{u}_i = y_i - hat{y}_i$. Taking the expectation, $E[hat{u}_i] = E[y_i] - E[hat{y}_i]$. Now, we know from the solution of the OLS minimisation problem $bar{y}_i = bar{hat{y}}_i$ because $bar{hat{u}} = 0$. If we take the probability limits, $plim : bar{y}_i = plim : bar{hat{y}}_i$. By the law of large numbers this leads to $E[y_i] = E[hat{y}_i]$. Hence, $E[hat{u}_i] = 0$. Is this proof correct? Besides, how would one interpret $E[hat{u}_i]$? $hat{u}_i$ results from a given sample. Expectation is a population concept. If we take the expectation of a residual, what would this represent? The sample means of a residual term in the long run or population?

Second case. This is easy. $E[hat{u}] = E[My] = E[M{u}] = ME[{u}] = 0$ because $My = MXB+Mu$ and $MX = 0$, because $X$ is non-random and hence can be taken out of the expectation operator, and because $E[u] = 0$. Here $M = I - P$ projection matrix. But my question is not about this case where $X$ is non-random, but the first case above where it is random.

Ben · Accepted Answer

Using OLS estimation, the residuals can be written using the hat matrix $mathbf{h} = boldsymbol{X} (boldsymbol{X}^text{T} boldsymbol{X})^{-1} boldsymbol{X}^text{T}$ as follows:

$$begin{equation} begin{aligned}
mathbf{r} &= (mathbf{I}-mathbf{h}) boldsymbol{Y} \[6pt]
&= (mathbf{I}-mathbf{h}) (boldsymbol{X} boldsymbol{beta} + boldsymbol{varepsilon}) \[6pt]
&= (mathbf{I}-mathbf{h}) boldsymbol{X} boldsymbol{beta} + (mathbf{I}-mathbf{h}) boldsymbol{varepsilon} \[6pt]
&= mathbb{0} + (mathbf{I}-mathbf{h}) boldsymbol{varepsilon} \[6pt]
&= (mathbf{I}-mathbf{h}) boldsymbol{varepsilon}. \[6pt]
end{aligned} end{equation}$$

So, assuming the error terms have zero mean, you have:

$$mathbb{E}(mathbf{r}|boldsymbol{X}) = mathbb{E}((mathbf{I}-mathbf{h}) boldsymbol{varepsilon}|boldsymbol{X}) = (mathbf{I}-mathbf{h}) mathbb{E}(boldsymbol{varepsilon}|boldsymbol{X}) = mathbf{0}.$$

markowitz · Answer

How would one prove that the expected value of the residuals from OLS
regression is zero?

In linear regression framework many problems can emerge from the so called error term. However you here speak unambiguously about residuals in OLS context.
Then, the expected value of residuals is zero by construction. Algebra demand it; the origin come from the first order conditions of optimization for OLS parameters. Usual assumptions about error term have no role.
Following Ben's notation we can write
$$begin{equation} begin{aligned}
mathbf{1}' mathbf{r} &= mathbf{1}'(hat{mathbf{Y}} -mathbf{Y}) 
= mathbf{1}' hat{mathbf{Y}} - mathbf{1}' mathbf{Y} =  0
end{aligned} end{equation}$$
Therefore not only the expected value is zero but the sum of residuals is precisely zero too, always. ($mathbf{1}$ is a vector of 1)
The problem of the Ben's explanation is in the second row
$$begin{equation} begin{aligned}
mathbf{r} &= (mathbf{I}-mathbf{h}) boldsymbol{Y} \[6pt]
&= (mathbf{I}-mathbf{h}) (boldsymbol{X} boldsymbol{beta} + boldsymbol{varepsilon}) \[6pt]
end{aligned} end{equation}$$
from the decomposition of $boldsymbol{Y}$ the assumption about the error term  $mathbb{E}(boldsymbol{varepsilon}|boldsymbol{X}) = mathbf{0}$ seems needed, but this is not. Important to note that hat matrix  ($mathbf{h}$) should be used on OLS parameters:   $mathbf{h} boldsymbol{Y}= boldsymbol{X}'boldsymbol{b} = hat{mathbf{Y}}$
Finally we can verify that by construction:
$mathbb{E} [boldsymbol{Y} | boldsymbol{X} ]= boldsymbol{X}'boldsymbol{b} = hat{mathbf{Y}}$
and  $boldsymbol{Y} = hat{mathbf{Y}}  + mathbf{r}$
therefore $mathbb{E} [mathbf{r} | boldsymbol{X} ]= mathbb{E} [boldsymbol{Y} | boldsymbol{X}] - mathbb{E} [hat{mathbf{Y}} | boldsymbol{X}] = boldsymbol{0}$
by construction too
As sidenote:

Expectation is a population concept. If we take the expectation of a
residual, what would this represent? The sample means of a residual
term in the long run or population?

Expectation is a general concept, you can refer it even at one observation only. For example even at just one coin tossing. The proper application depend on the context and the question.
Your context is linear regression estimated with OLS, therein the residuals are a well defined object. Important to note that residuals are an estimation quantity, you compute them. Different thing are errors, them are outside the control of researcher, them are unobservables, for this reason you have to make assumption about.
Something like "population residuals" is an ambiguous object. Residuals are all in your hands, always. You can think about these in a scheme where the amount of observations go to infinity or cover all the population. But nothing change in the above algebra and implications; them depend from the so called Geometry of OLS.
Said that, residuals remain interpretable as random variables and you can compute their expectation. Without loss of generality you can think about expectation of residuals (or estimators, ecc) as conditional of regressors ($mathbb{E}(mathbf{r}|boldsymbol{X})$), so no matters if them are stocastic or not.

Expected value of the residuals

2 Answers

Add your own answers!

Ask a Question