Cross Validated Asked on December 8, 2021
I want to construct a linear model among several variables. The model is $y = beta_0 + beta_1 x + beta_2 z + varepsilon$, in which $x$ is a continuous variable, and $z$ is a dummy variable, i.e. $z in {0, 1}$. The objective of modeling is to inspect and compare the effects of $x$ and $z$ on $y$. My question is:
(1) Whether is comparing the effects of continuous and dummy variables viable?
(2) If possible, whether should I normalize observations of $x$ before modeling?
PS: It is a common practice to convert the categorical covariate into several dummy variables in linear regression. Thus, I use "categorical variable" in the question title but use "dummy variable" when giving the problem setup.
As far as I know, normalization is not mandatory in LR if the objective is prediction. But if the focus is on comparing the effect of different covariates on response variable, normalization is required to ensure the scales of covariates are comparable.
For the scenario presented here, it can be seen that normalization of $x$ or not did affect the estimates of regression coefficients by a simple numerical experiment below.
set.seed(1234)
N = 10000
b_0 = 0.8
b_1 = -1.2
b_2 = 1.3
x = rnorm(N, 2, 2)
z = rnorm(N)>0
y = b_0 + b_1*x + b_2*z + rnorm(N)
lm(y ~ x + z)
# Call:
# lm(formula = y ~ x + z)
# Coefficients:
# (Intercept) x zTRUE
# 0.7973 -1.2076 1.3441
x2 = (x - 2)/2
lm(y ~ x2 + z)
# Call:
# lm(formula = y ~ x2 + z)
# Coefficients:
# (Intercept) x2 zTRUE
# -1.618 -2.415 1.344
Note: this code is modified from abalter.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP