Cross Validated Asked by mkn1 on November 2, 2021
I’m looking at patterns of a particular injury within individuals and how they vary by age and sex. For each of 1365 individuals I have four locations each of which may be positive for this injury.
sub_id, age, sex, bone, side, outcome
2250, 21, f, tibial, lateral, TRUE
2250, 21, f, tibial, medial, FALSE
2250, 21, f, femoral, lateral, TRUE
2250, 21, f, femoral, medial, FALSE
2258, 21, m, tibial, lateral, FALSE
The relationship appears to be non linear. The figure below shows the actual data by age and sex for one location.
I first attempted to model these data using GAM, figure 2.
gam.model <-
gam(
outcome ~ bone + side + s(age, by = sex) + bone * sex + side * sex,
family = binomial,
data = my_data
)
But this does not account for the repeated measures within each person. GAMM could do this but apparently does not do well with binomial data. Someone suggested I try a GEE model, model the sexes separately, and use splines::ns
for the non-linearity.
gee.model.m <-
gee::gee(
outcome ~ bone + side + splines::ns(age, df = 5),
id = sub_id,
corstr = "exchangeable",
family = binomial,
data = my_data[sex == "m"][order(sub_id)]
)
This does allow me to include the repeated measures information. But I also have to choose df for the splines. My choice of df=5 is random and this choice strongly affects the resultant model. Is this an appropriate model to use? If so how to I choose df? Is there a way of comparing models to see which one is best?
There are a couple of options with GAMs that allow for smoothness selection which will avoid the issue of having to set the degrees of freedom for the splines in the GEE code you show.
The easiest one if you don't have complex random effects or large numbers of subjects, is to use the random effect "smooth" for use with gam()
:
m <- gam(outcome ~ bone + side + sex + bone:sex + side:sex +
s(age, by = sex) + s(sub_id, bs = 're'),
family = binomial, data = my_data, method = "REML")
assuming sub_id
is a factor coding for the subjects that you have repeatedly observed.
The second option, which is appropriate if you have more complex random effect settings as gam()
will get slow with large numbers of subjects or many different random effects, is to use gamm4::gamm4()
. This function uses mgcv smooths (s()
and t2()
; not te()
, ti()
unfortunately) but fits the model in its mixed model form using the glmer()
from the lme4 package which uses methods that are much better behaved than the PQL-based method exposed via gamm()
for binomial data or low count Poisson. The main advantage of just using gam()
(the first solution) is that you have a much wider array of families available for the conditional distribution of the repsonse.
Answered by Gavin Simpson on November 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP