Cross Validated Asked on November 2, 2021
I’m reading about test/generalization error in Hastie et al.’s Elements of Statistical Learning (2nd ed). In section 7.4, it is written that given a training set $mathcal{T} = {(x_1, y_1), (x_2, y_2), ldots, (x_N, y_N)}$ the expected generalization error of a model $hat{f}$ is $$Err = E_{mathcal{T}}[E_{X^0, Y^0}[L(Y^0, hat{f}(X^0))|mathcal{T}]],$$
where the point $(X^0, Y^0)$ is a new test data point, drawn from $F,$ the joint distribution of the data.
Suppose my model is a linear regression (OLS) model, that is, $hat{f}(X) = Xhat{beta} = X(X^TX)^{-1}X^TY$, assuming that $X$ has full column rank. My question is, what does it mean to (1) take the expected value over $X^0, Y^0$, and (2) take the expected value over the training set $mathcal{T}$?
For example, suppose $Y = Xbeta + epsilon$, where $E[epsilon]=0, Var(epsilon) = sigma^2I.$
(1) Consider evaluating $E_{X^0, Y^0}[X_0hat{beta}|mathcal{T}]$, is the following correct?
begin{align*}
E_{X^0, Y^0}[X^0hat{beta}|mathcal{T}] &= E_{X^0, Y^0}[X^0(X^TX)^{-1}X^TY|mathcal{T}]\
&= E_{X^0, Y^0}[X^0|mathcal{T}](X^TX)^{-1}X^TY\
&= E_{X^0, Y^0}[X^0](X^TX)^{-1}X^TY
end{align*}
The last equality holds if $X^0$ is independent of the training set $mathcal{T}$.
(2) Consider evaluating $E_{mathcal{T}}[X^0hat{beta}|X^0]$, is the following correct?
begin{align*}
E_{mathcal{T}}[X^0hat{beta}|X^0] &= X^0 E_{mathcal{T}}[(X^TX)^{-1}X^TY|X^0]\
&= X^0 (X^TX)^{-1}X^TE_{mathcal{T}}[Y|X^0]\
&= X^0 (X^TX)^{-1}X^TXbeta
end{align*}
The second equality holds assuming that the covariates $X$ are fixed by design, so the only thing that’s random with respect to the training set $mathcal{T}$ is $Y$, correct?
You can drop all subscripts in the expected values, and via the Law of Total Expectation, we have $$text{Err}=mathbb E[mathbb E[L(Y^0,hat f(X^0))|mathcal T]]=underbrace{mathbb E[L(Y^0,hat f(X^0))]}_{text{Expected Loss}}$$
In the end, we're interested in knowing the expected loss. The conditioning is important because as Hastie explains in the subsequent sections, the outer expected value is estimated via cross-validation. You can analytically calculate it if you know the distribution of data, i.e. $mathcal T$.
(1) is correctly calculated. (2) is not correct because the expected value is taken wrt the distribution of $mathcal T$. So, $X$ is not fixed (Is $X$ fixed in cross-validation?). The only thing that's fixed in $E_{mathcal{T}}[X^0hat{beta}|X^0]=mathbb E[X^0hat beta|X^0]$ is $X^0$ because it's in the given side of the expression. Without knowing the data distribution, you can't analytically calculate this expected value. Instead you can estimate it via cross-validation.
Answered by gunes on November 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP