Difference between Linear Mixed Regression and Generalized Estimating Equation Results

Question

I am using commonly available iris dataset and trying to do following regression:
PW ~ PL + SL + SW
Since samples are taken from 3 "Species", this is kept as random or group variable.
The results of Linear Mixed Regression are:
        Mixed Linear Model Regression Results
=====================================================
Model:            MixedLM Dependent Variable: PW     
No. Observations: 150     Method:             REML   
No. Groups:       3       Scale:              0.0278 
Min. group size:  50      Log-Likelihood:     41.4680
Max. group size:  50      Converged:          Yes    
Mean group size:  50.0                               
-----------------------------------------------------
           Coef.  Std.Err.   z    P>|z| [0.025 0.975]
-----------------------------------------------------
Intercept   0.082    0.335  0.245 0.807 -0.575  0.740
SL         -0.098    0.045 -2.199 0.028 -0.186 -0.011
SW          0.238    0.048  4.975 0.000  0.144  0.332
PL          0.257    0.050  5.139 0.000  0.159  0.355
Group Var   0.257    1.636                           
=====================================================

While the results of GEE regression are:
                               GEE Regression Results                              
===================================================================================
Dep. Variable:                          PW   No. Observations:                  150
Model:                                 GEE   No. clusters:                        3
Method:                        Generalized   Min. cluster size:                  50
                      Estimating Equations   Max. cluster size:                  50
Family:                           Gaussian   Mean cluster size:                50.0
Dependence structure:         Independence   Num. iterations:                     2
Date:                     Thu, 16 Jul 2020   Scale:                           0.037
Covariance type:                    robust   Time:                         02:42:49
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.2403      0.151     -1.595      0.111      -0.536       0.055
SL            -0.2073      0.088     -2.349      0.019      -0.380      -0.034
SW             0.2228      0.073      3.036      0.002       0.079       0.367
PL             0.5241      0.049     10.711      0.000       0.428       0.620
==============================================================================
Skew:                          0.2232   Kurtosis:                       0.9437
Centered skew:                -0.2824   Centered kurtosis:              1.2493
==============================================================================
=============== cov_struct.summary() ===============
Observations within a cluster are modeled as being independent.

Although P-values for all 3 predictor variables are significant in both, they are different in 2 analyses.
Moreover, the coefficients are quite different:

Which of these analyses is more appropriate and acceptable? Thanks for your insight.

Robert Long · Accepted Answer

When I fit these models in R I get very similar estimates to those that you obtained:

> data("iris")

> # lmm
> m.lmm <- lmer(Petal.Width ~ Sepal.Length + Sepal.Width + Petal.Length + (1|Species), data = iris)
> m.gee <- geeglm(Petal.Width ~ Sepal.Length + Sepal.Width + Petal.Length, id = Species, data = iris, corstr = "independence")
> summary(m.lmm)

Fixed effects:
             Estimate Std. Error t value
(Intercept)    0.0821     0.3356    0.24
Sepal.Length  -0.0984     0.0444   -2.22
Sepal.Width    0.2380     0.0477    4.99
Petal.Length   0.2567     0.0478    5.37

> summary(m.gee)

 Coefficients:
             Estimate Std.err   Wald Pr(>|W|)    
(Intercept)   -0.2403  0.1506   2.55   0.1106    
Sepal.Length  -0.2073  0.0882   5.52   0.0188 *  
Sepal.Width    0.2228  0.0734   9.22   0.0024 ** 
Petal.Length   0.5241  0.0489 114.72   <2e-16 ***

The diffeence is mostle due to using independence as the correlation structure. To be equivalent to the mixed model you should use exchangable:

> m.gee1 <- geeglm(Petal.Width ~ Sepal.Length + Sepal.Width + Petal.Length, id = Species, data = iris, corstr="exchangeable")
> summary(m.gee1)

 Coefficients:
             Estimate Std.err  Wald Pr(>|W|)    
(Intercept)    0.0767  0.1960  0.15    0.695    
Sepal.Length  -0.1015  0.0254 16.02  6.3e-05 ***
Sepal.Width    0.2357  0.0958  6.06    0.014 *  
Petal.Length   0.2647  0.0332 63.45  1.7e-15 ***

Exchangeable correlation structure means that the residual covariance between all species is the same, which is the same assumption as in mixed effects models.

Difference between Linear Mixed Regression and Generalized Estimating Equation Results

One Answer

Add your own answers!

Ask a Question