Exploratory factor analysis in a panel setting

Question

We would like to apply an exploratory factor analysis (EFA) in a panel setting, i.e. where variables are observed within an individual over time.
Principally, two ways of doing this come into mind:
a. Apply EFA on a yearly basis and see if the factor structure is similar.
b. Apply EFA on the whole data set, ignoring its panel structure.
While a appears quite tiresome and difficult to implement -especially if the number of time periods is high-, b seems to ignore essential aspects of panel data: different amount of observations per year and time/id - fixed effects which -of course- possibly influence the factor structure.
If a leads to stable results, b could be applied with less worries. If it does not, one could obtain factor scores for each time period. However, using Stata this process would be quite cumbersome, as it'd mean separate, score the factors and append back the whole data set.
Is there a routine or an more straightforward way to conduct this type of analysis?

dcoy · Answer

I'm currently trying to make a similar decision and will provide a partial answer in hopes that I can be helpful and/or stoke the conversation enough to bring in a better answer, as I have searched thoroughly and can't find a post that answers your (our) question completely.
My short answer is that you should also consider individual-level demeaning and/or first-differencing of the variables before pooling the data and extracting factors to see whether this produces different results from the factors that emerge from the raw data.
In scenario "a", you run the risk of considering only between-individual relationships among variables in the extraction of the factors. In scenario "b" you run the risk of an unidentified mixture of both between/within, which might or might not be a problem, depending on the nature of the analysis and the data.
If factors are expected to capture between-individual attributes only, it seems defensible to average individuals over the panel before extracting. It would be more rigorous to use your option "a". An even more rigorous option would be to determine plausible factor components, perhaps through option "a", and confirm that each factor is a good fit, using a longitudinal CFA model. The best resource for the latter that I have found is Chapter 5 in Todd Little's Longitudinal Structural Equation Modeling. This is the overview of Chapter 5 in Long (2013, 137):

The longitudinal CFA model addresses a number of important questions about
the model, the data, and the sample. These questions include: (1) Are the measurements
of each construct factorially invariant across measurement occasions?, (2) how stable
are the cross-time relations of the constructs?, (3) how stable are the within-time relations among the constructs?, (4) have the constructs’ variances changed over time?, and (5) have the constructs’ mean levels changed over time? In this chapter, I describe how to assess each of these questions.

If your analysis is using panel data as a means of establishing causal relationships related to within-unit changes over time, both approaches you mention are vulnerable to problems. At a minimum, you would want to ensure that factors from individual-level demeaned or differenced variables produce similar factors to those from the raw data in your scenarios. If you are interested in factors that capture within-individual change using the raw data, your raw data would need to satisfy i.i.d (if I'm not mistaken), which is often not the case in panel data. If they do not satisfy i.i.d., the factors that emerge are likely to be a mixture of between-individual and within-individual factors that could confound the EFA. For example, if some individuals are systematically higher or lower on a variable on average over the panel, the within-individual variation on that variable is unlikely to emerge as a factor with other "correct" within-individual deviations in the same direction from other variables. The within-individual increase/decrease will not be measured "correctly" as higher/lower within individuals, as it could still be much lower/higher than within-individual deviations in the same direction.
A potential additional problem with approach "b" is that the data could be non-stationary. In this case, components can spuriously load as a factor when they increase, decrease or are shocked together over the course of the panel.
You might find more specific solutions in Bai and Ng (2004) and/or Onatski and Wang (2020). These discuss the latter two issues in detail.
I hope someone else can weigh in and provide a more definitive answer.

Exploratory factor analysis in a panel setting

One Answer

Add your own answers!

Ask a Question