# Why GEE estimates are smaller than GLMM?

Cross Validated Asked on December 7, 2020

Both are estimators that maximize the marginal likelihood, only GLMM does so by first considering the conditional probability, while GEE assumes a covariance structure of the marginal probability directly. So why should the coefficients be systematically different one from the other (GEE gives coefficients that are smaller in magnitude)?

This actually depends on the link function -- eg, for a log link there is not a systematic difference, but for a logit link there is.

The reason is that the models are systematically different and the marginal likelihoods are systematically different. As the simplest example consider a logistic GLMM with a random intercept, for longitudinal data indexed by person $$i$$ and time $$t$$

$$mathrm{logit} E[Y_{it}|X_{it}=x, a_i] = a_i+xbeta$$ where $$a_isim N(alpha, tau^2)$$

The GEE marginal mean model is

$$mathrm{logit} E[Y_{it}|X_{it}]=tildealpha+xtildebeta$$

So how are $$beta$$ and $$tildebeta$$ related? Well, the GLMM has $$E[Y_{it}|X_{it}=x, a_i] = mathrm{expit},(a_i+xbeta)$$ so $$E[Y_{it}|X_{it}=x] = E_a[E[Y_{it}|X_{it}=x, a_i]]=E_a[mathrm{expit},(a_i+xbeta)]$$ so $$mathrm{logit}, E[Y_{it}|X_{it}=x] = mathrm{logit},E_a[mathrm{expit},(a_i+xbeta)]$$

The GEE has $$mathrm{logit} E[Y_{it}|X_{it}=tildealpha+xtildebeta$$

These would be the same if expectations and $$mathrm{expit}$$ commuted, but they don't. For a log link, the $$beta$$ would be the same, because you can take an $$e^beta$$ multiplier through the expectation, but the $$alpha$$ would be systematically different.

Ok, so we know $$betaneqtildebeta$$ (for the true parameters, not just the estimates). Why is $$|beta|>|tildebeta|$$?

I think this is easiest with a picture

expit<- function(x) exp(x)/(1+exp(x))

x<-seq(-6,6,length=50)
eta_c <- 0+1*x
mu_c <- expit(eta_m)
plot(x, mu_c,ylab="P(Y=1)",lwd=2,type="n",xlim=c(-6,6))

a<-rnorm(20,s=2)

total_m<-numeric(50)
for(ai in a){
eta_c <- ai+0+1*x
mu_c <- expit(eta_c)
lines(x, mu_c, col="grey")
total_m<-total_m+mu_c
}

mu_m<-total_m/20
lines(x, mu_m, col="blue")


What we see here is 20 realisations in grey of the conditional mean functions for 20 random $$a_i$$, and the blue curve that is the average of the grey curves, which is the GEE mean curve. They are basically the same shape, but the population-average curve is flatter; $$tildebeta.

The grey curves are all the same shape. The derivative of $$p= mathrm{expit}eta$$ wrt $$eta$$ is $$frac{partial p}{partialeta} = mathrm{expit}eta (1-mathrm{expit}eta)=p(1-p)$$ so $$frac{partial p}{partial x} = p(1-p)frac{partialeta}{partial x}=p(1-p)beta$$ That is, the grey curves all have slope $$beta/4$$ where they cross $$p=0.5$$ and the blue curve will have slope $$tildebeta/4$$.

One issue I've avoided here is that the GEE and GLMM logistic models are incompatible; they can't both be exactly true. But you could pretend that I used a probit link instead, where they are compatible, or that I'd looked up the relevant bridge distribution to replace the Normal distribution for $$a_i$$.

Correct answer by Thomas Lumley on December 7, 2020