Cross Validated Asked by Javier Mariño on November 16, 2021
I am studying the topic of regression for the first time and some questions arise. First, linear regression is a estimation of conditional expectation? And also the conditional expectation estimate is the so-called $y ̂$ estimate? This is:
$$y=E(Y|X)+e$$ $$y=y ̂+e$$ $$y ̂=E(Y|X)$$ $$?(?|?)=?+??$$
Second, the linearity of the parameters is an assumption of the linear regression to estimate the conditional expectation? $$ $$ Third, Hansen’s book on econometrics says about this problem: "the linear CEF model is empirically unlikely to be accurate unless $x$ is discrete and low-dimensional so all interactions are included. Consequently in most cases it is more realistic to view the linear specification as an approximation". What interpretation can be given to this phrase?
(Don’t read this parenthetical part for a few months or years until you’re much more comfortable with regression. The subtle point is that we often don’t see the predictors as random variables, so there isn’t a multivariate distribution where we condition on many variables to examine $Y$. We think of $Yvert X$ as a family of univariate distributions that are parameterized by the predictor variables. This is technically correct in many cases but not especially useful, particularly not to a beginner.)
For the first two, I think it makes sense when you start simulating regressions. I’ll let you think about how to do that and can come back and edit this answer with some R code. But I do think it’s a good exercise to think through it for a while.
Answered by Dave on November 16, 2021
The linear regression provides the minimum mean squared error linear-in-parameters approximation to the CEF. If you can approximate a function with a Taylor series expansion with enough terms, you could do this pretty well, even if the actual CEF is nonlinear, by using lots of interactions and polynomial terms as long as you have enough data and have not left anything important out of your model.
If your world is truly is low dimensional and discrete, by calculating the mean in each cell (like average wage for college educated Asian women who live in the Midwest and enjoy musical theatre), your approximation of the CEF could be very good. This is what it means to include all interactions. With continuous covariates this is harder, since you have to either bin your data or smooth it to interpolate the unobserved data, and the approximation can be quite poor.
Here's toy example where we approximate a fairly non-linear Poisson CEF $$E[Y vert X,Z] = exp(a + b cdot X +c cdot Z + d cdot X cdot Z)$$ with means and with regression with all interactions. Here X takes on 5 values and Z takes on 2, so we have 10 cells in total if we use dummy variables:
. set obs 5
number of observations (_N) was 0, now 5
. gen x = _n
. expand 100
(495 observations created)
. gen z = mod(_n,2)
. gen y = rpoisson(x+2*z)
. table x z, c(mean y)
----------------------
| z
x | 0 1
----------+-----------
1 | 1.06 2.76
2 | 2.04 4.16
3 | 2.96 4.96
4 | 4.26 6.58
5 | 5.18 6.76
----------------------
. quietly reg y i.x#i.z
. margins x#z
Adjusted predictions Number of obs = 500
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x#z |
1 0 | 1.06 .2874746 3.69 0.000 .495165 1.624835
1 1 | 2.76 .2874746 9.60 0.000 2.195165 3.324835
2 0 | 2.04 .2874746 7.10 0.000 1.475165 2.604835
2 1 | 4.16 .2874746 14.47 0.000 3.595165 4.724835
3 0 | 2.96 .2874746 10.30 0.000 2.395165 3.524835
3 1 | 4.96 .2874746 17.25 0.000 4.395165 5.524835
4 0 | 4.26 .2874746 14.82 0.000 3.695165 4.824835
4 1 | 6.58 .2874746 22.89 0.000 6.015165 7.144835
5 0 | 5.18 .2874746 18.02 0.000 4.615165 5.744835
5 1 | 6.76 .2874746 23.52 0.000 6.195165 7.324835
------------------------------------------------------------------------------
. quietly poisson y i.x#i.z
. margins x#z
Adjusted predictions Number of obs = 500
Model VCE : OIM
Expression : Predicted number of events, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x#z |
1 0 | 1.06 .1456022 7.28 0.000 .7746249 1.345375
1 1 | 2.76 .2349468 11.75 0.000 2.299513 3.220487
2 0 | 2.04 .2019901 10.10 0.000 1.644107 2.435893
2 1 | 4.16 .2884441 14.42 0.000 3.59466 4.72534
3 0 | 2.96 .2433105 12.17 0.000 2.48312 3.43688
3 1 | 4.96 .3149603 15.75 0.000 4.342689 5.577311
4 0 | 4.26 .2918904 14.59 0.000 3.687905 4.832095
4 1 | 6.58 .3627671 18.14 0.000 5.868989 7.291011
5 0 | 5.18 .3218695 16.09 0.000 4.549147 5.810853
5 1 | 6.76 .3676955 18.38 0.000 6.03933 7.48067
------------------------------------------------------------------------------
If you omit the interaction between X and Z, you get something slightly worse:
. quietly reg y i.x i.z
. margins x#z
Adjusted predictions Number of obs = 500
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x#z |
1 0 | 1.024 .2111675 4.85 0.000 .6091028 1.438897
1 1 | 2.936 .2111675 13.90 0.000 2.521103 3.350897
2 0 | 1.914 .2111675 9.06 0.000 1.499103 2.328897
2 1 | 3.826 .2111675 18.12 0.000 3.411103 4.240897
3 0 | 3.324 .2111675 15.74 0.000 2.909103 3.738897
3 1 | 5.236 .2111675 24.80 0.000 4.821103 5.650897
4 0 | 3.854 .2111675 18.25 0.000 3.439103 4.268897
4 1 | 5.766 .2111675 27.31 0.000 5.351103 6.180897
5 0 | 5.084 .2111675 24.08 0.000 4.669103 5.498897
5 1 | 6.996 .2111675 33.13 0.000 6.581103 7.410897
------------------------------------------------------------------------------
This is an example of misspecification.
Answered by dimitriy on November 16, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP