Cross Validated Asked by UzbeKistaN on December 21, 2020
I have a panel dataset with countries as individuals observed per year. My analysis concerns a macroeconomic study and as often happens in these cases (I would not be wrong but they are commonly called "macro-panel" or "wide-panel"). I have few temporal observations per country, about 15-24 years, and there is evidence of individual heterogeneity, i.e. with an individual unobserved effects.
Using static panel models, the literature suggests using fixed-effect models in such cases, but in the models I am analyzing the Hausman test tells me the opposite, preferring the random-effect models. I don’t understand why I get this result. I’m starting to think that it is due to the presence of variables that change little or no at all over time, and this generates problems in fixed models. However, I consider these regressors important for my analysis and therefore I would like to know if it is coherent to choose the random effects and how I can justify this choice.
Also my models suffer from both cross-sectional dependence and serial correlation. I don’t know how much of a problem this can be, the only way I’m dealing with them is by using robust standard errors. On the other hand, I have found research regarding my work and most of them are based on dynamic models, using generalized method of moments (GMM) with a lagged dependent variable as a regressor. Unfortunately, I don’t know well the theory on which GMMs are based, but I have learned of their versatility, especially in situations like these, which has made them increasingly used in recent years.
Therefore, given the diagnostic problems I am having with static models, I wonder if it is more appropriate to use dynamic models, or to stay on fixed / random models taking the necessary precautions, and on the basis of what I can evaluate this choice, comparing the results obtained with the two different approaches (static vs dynamic).
I hope I was clear.
Thanks for the answer and for editing my post, I recognize that I have many gaps in English. Thanks also for the references you recommended. I will give them a look
I have never heard of the term "wide panel" before. I assume you are referring to a dataset with some fixed N and small T, thus giving the appearance of a wider data frame. I would argue "macro panel" is a more apt description, where you have a reasonably large number of countries over many years (e.g., 20 years or more). The term "macro panel" is used quite frequently in Chapter 12 of Badi Baltagi's Econometric Analysis and Panel Data.
Yes, by "wide panel" I was referring to a dataset with a large N and a short T. Reading in other forums/sites some abused this term, maybe erroneously, which was also unknown to me. However, in these days I have made some adjustments and ultimately the dataset I will work on will be composed of 30 countries in 15 years.
For the question related to the Hausman test and the difference between fixed and random effects, in relation to my dataset and my analysis, I had some doubts about the results precisely because the European countries I'm analyzing are the only ones I have available for the data I need, probably also because in all other countries there is not a sufficiently developed economic-financial context.
To be more specific, my work is to analyze the life insurance market, in particular how demand has evolved in Europe in relation to macroeconomic-financial and demographic variables. Therefore the fact that I have found 30 countries I think is very satisfactory.
So, returning to the concept of sampling and the choice of the fixed/random model, as you said, as were my expectations before the results I obtained, in my context of analysis the fixed effects should be the most appropriate.
Being a novice in panel econometrics, my benchmark was the Hausman test, but now I don't know how much to trust. I can try to remove those quasi-invariant variables over time, but I would consider them important because they are mostly demographic variables (life expectancy, population structure indexes and employment/unemployment rate. I have several models because I am implementing a model selection based on some measures of goodness of fit: adjusted R-square, RMSE and AIC). Without them I would be left with a model that would explain little, both from an interpretative and an R-squared point of view (I know that R-squared is not an absolute measure to evaluate a model, but if its value is very low, max 0.15, I think I could improve it by adding some regressors).
This is also the reason why maybe in this case a static model has several limits with these variables, which has led the literature (if necessary I can also cite some sources) to use GMMs for dynamic models.
For my analysis I'm using R and the plm
package. I'm reading the vignette you suggested and I'm also reading the book "Panel Data Econometrics with R" written by the same authors.
I have already tried some GMM model with the pgmm
function but not knowing the theory it was just an experiment. I wouldn't want to go off-topic but doing the summary
of the pgmm
models I noticed some things, for example some tests and the fact that the R-square is not reported (as said if I have to do a model selection I need some performance measure to compare the different models. Maybe if I have to work on GMMs with R I will look for answers on Stackoverflow).
What diagnostic problems did you run into? Just because the results of a Hausman test suggest random effects doesn't mean you have a problem. Running a "static" random effects model is perfectly acceptable in my estimation.
With diagnostic problems I meant serial correlation and cross-sectional dependence, with which I solved using the robust covariance matrix
Answered by UzbeKistaN on December 21, 2020
My analysis concerns a macroeconomic study and as often happens in these cases (I would not be wrong but they are commonly called "macro-panel" or "wide-panel").
I have never heard of the term "wide panel" before. I assume you are referring to a dataset with some fixed $N$ and small $T$, thus giving the appearance of a wider data frame. I would argue "macro panel" is a more apt description, where you have a reasonably large number of countries over many years (e.g., 20 years or more). The term "macro panel" is used quite frequently in Chapter 12 of Badi Baltagi's Econometric Analysis and Panel Data.
Using static panel models, the literature suggests using fixed-effect models in such cases, but in the models I am analyzing the Hausman test tells me the opposite, preferring the random-effect models. I don't understand why I get this result, I'm starting to think that it is due to the presence of variables that change little or no at all over time, and this generates problems in fixed models.
Results from a Hausman test suggest random effects. In short, your unique errors are not correlated with your regressors. You might favor a random effects estimator if some of these time-invariant or "slow moving" regressors are of substantive interest. In a fixed effects model, all time-constant variables included in your model will be collinear with the country-specific effect and summarily dropped; this is because a fixed effects estimator only uses the time-series variation from within each country. A random effects estimator, on the other hand, will exploit some "between unit" (i.e., cross-country) variation, and thus any time-constant variables may remain. In fact, the random effects estimator is akin to a weighted average of the "within" and "between" estimators.
You also state that the presence of omitted, time-constant confounders "generates problems in fixed models." I disagree with this statement. The attractiveness of fixed effects estimators is they will 'partial out' the effects of all time-constant variables—even those you have not explicitly measured (or even thought of). If you proceed with a random effects model, then your country-specific effect is treated as random and is assumed to be uncorrelated with your explanatory variables. Is this a reasonable assumption in your setting? In practice, this assumption is often incorrect. It is unlikely that the true correlation between the unit (i.e., country) effects and your covariates is exactly zero. Quoting from Clark & Linzer (2015):
[I]f the Hausman test fails to reject the null hypothesis of orthogonality, it is most likely not because the true correlation is zero....Rather, it is likely that the test has insufficient statistical power to reliably distinguish a small correlation from zero correlation....Of course, in many cases, a biased (random-effects) estimator can be preferable to an unbiased (fixed-effects) estimator if the former provides sufficient variance reduction over the latter. The Hausman test does not help evaluate this trade-off."
I should also note that the advice you receive from others might be discipline-specific. I once conferred with an epidemiologist on a project and he was more than happy to disregard the results from a Hausman test. Applied econometricians, on the other hand, might be more predisposed to let the results from a Hausman test guide their approach. I'm expert in neither field of study, so I shouldn't speak for entire disciplines. Maybe one of them will jump in and set me straight on that one.
It is difficult to offer guidance without more detail about the theoretical model under consideration. In the comments I drew your attention to the notion of generalizability outside of your sample of countries. It might be more appropriate to treat the unit-specific effect as "random" if you're sampling a subset of units from a larger, unobserved population. The units can be individuals, hospitals, precincts, counties, et cetera. Sometimes we are interested in the units outside of our sample, and other times we only care about the sample at hand. I think this is important to consider in your situation. Suppose you wish to survey the attitudes, beliefs, and perceptions of college students about some important social issue over time across a diverse range of campuses in the United States. Scarce funding for the project limits your ability to obtain repeated measures across time at all universities, so you decide to observe a subset of campuses instead. If your aim is to generalized to all campuses in the United States (or any broader campus population outside of your sample), then treating the "campus" as random might be preferred. Or, maybe you already sampled all observations from the relevant population. There are many examples of this in the real world.
For example, suppose I was hired to evaluate a new policy implemented by a large metropolitan police department. A subset of law enforcement districts implement the policy and others do not. Now suppose police officials want to know if the new policy/directive was effective at lowering the crime rate. To begin, I might survey all districts that comprise that agency before and after the policy exposure period. In this case, I'm sampling the entire population of law enforcement units, which include all treated and untreated districts. Again, I only care about the districts comprising one large metropolitan agency. If I only wish to make statements about that particular agency, then I might proceed with a fixed effects estimator, or some derivative of it. Now suppose I sample a much broader array of police districts in a particular geographic region of the United States and I want to make statement about all districts in the entire country. In this setting, it might want to treat the district as a random effect.
In your setting, it appears you're only interested in European countries. I don't presume your results will guide policy or be applicable to Asian markets—or maybe they will. As I see it, you're only sampling a subset of the 44 European countries (though to be precise, it might be 49 countries if you're considering the Eurasian Caucasus region as part of the European continent). As you already noted, your results shouldn't change much if you obtained the full population of European countries. If you're only interested in the subset of European markets, then maybe modeling the country effect as 'fixed' is the better way to go.
Therefore, given the diagnostic problems I am having with static models, I wonder if it is more appropriate to use dynamic models, or to stay on fixed / random models taking the necessary precautions, and on the basis of what I can evaluate this choice, comparing the results obtained with the two different approaches (static vs dynamic).
What diagnostic problems did you run into? Just because the results of a Hausman test suggest random effects doesn't mean you have a problem. Running a "static" random effects model is perfectly acceptable in my estimation.
It is also reasonable for you to explore a dynamic model. A very good predictor of behavior or economic activity at time $t$ is the period $t-1$. But a dynamic model introduces a new set of problems. To be clear, by "dynamic" I mean including a lagged dependent variable(s) on the right-hand side of your equation. I would caution you against modeling "country" as either fixed or random while also including a lagged dependent variable as a predictor. The lagged version of your outcome will be correlated with the random effect that is part of your error term.
I also encourage you to read Paul Allison's blog in regard to the problems associated with using a lagged outcome as a predictor in panel data models. The discussion is quite interesting. In sum, there is no correct answer to your question. I wouldn't stray too far from other applied work addressing your specific research question. Here is a recent paper by Leszczensky & Wolbring 2019 that offers some alternative approaches to addressing reverse causality in panel data contexts.
I have found research regarding my work and most of them are based on dynamic models, using generalized method of moments (GMM) with a lagged dependent variable as a regressor.
You didn't cite any empirical work in your question but there is strong literature out there which shows that the generalized method of moments (GMM) estimator can generalize to panel data with spatially and temporally correlated error components. I'm not sure what software you're working with, but you could run a dynamic panel model using GMM in R fairly easily. The GMM estimator is provided by the pgmm()
function. It’s main argument is a dynformula
which describes the variables of the model and the lag structure (see here for more information). I believe this package mirrors xtabond2 from Stata. I can't speak for all software packages but I believe you could implement a model fairly easily based upon the associated documentation.
I hope this helps!
Answered by Thomas Bilach on December 21, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP