# Categorical variable as explanatory variable (right hand side)

Economics Asked by Marcel Campion on December 29, 2020

In a linear probability model, or any sort of regression, one can use fixed effect estimation by simply adding in a STATA code i.something. This “something” can be either a village, a county or a country. When doing so one look at variation within this geographical unit, as follow:

$Y_{ivt} = B_0 + B_1X_{it} + B_2X_{vt} + alpha_v + epsilon_{vit}$

Where indexes $_i$, $_v$ and $_t$ represent respectively individual, village and time dimensions. The term $a_v$ stands for village fixed effect thus any regression will look at within village variation.

STATA code (1) : reg Y Var1 Var2 i.village, vce(cluster village)

Here I come to the point. In the set of covariates that I am using there is one categorical variable taking several different values. This categorical variable can represent colors, insurance company or ethnicity etc. In STATA I introduce this variable as i.categorical. Thus the STATA code becomes:

STATA code (2) : reg Y Var1 Var2 i.categorical i.village, vce(cluster village)

I have a hard time interpreting the implication of this regression. When running such regression, am I looking at variation within categories within village? That is looking at variation in Y for individuals belonging to the same category within a same village.

Thank you!

Your interpretation of the coefficient on Var1 or Var2 is the effect that variable has on Y if the person is in the base category. I.e., if all the categorical variable dummies were 0.

You are not looking at variation within each category. You are assuming that two observations within the same category but different villages are independent, unlike what you assume for two observations within the same village.

The categorical dummies are there to hold them constant as you look at the other variables.

Answered by ahorn on December 29, 2020