TransWikia.com

Dealing with singular fit in mixed models

Cross Validated Asked by User33268 on December 25, 2021

Let’s say we have a model

mod <- Y ~ X*Condition + (X*Condition|subject)

# Y = logit variable  
# X = continuous variable  
# Condition = values A and B, dummy coded; the design is repeated 
#             so all participants go through both Conditions  
# subject = random effects for different subjects 

summary(model)
Random effects:
 Groups  Name             Variance Std.Dev. Corr             
 subject (Intercept)      0.85052  0.9222                    
         X                0.08427  0.2903   -1.00            
         ConditionB       0.54367  0.7373   -0.37  0.37      
         X:ConditionB     0.14812  0.3849    0.26 -0.26 -0.56
Number of obs: 39401, groups:  subject, 219

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)       2.49686    0.06909   36.14  < 2e-16 ***
X                -1.03854    0.03812  -27.24  < 2e-16 ***
ConditionB       -0.19707    0.06382   -3.09  0.00202 ** 
X:ConditionB      0.22809    0.05356    4.26 2.06e-05 ***

Here we observe a singular fit, because the correlation between intercept and x random effects is -1. Now, according to this helpful link one way to deal with this model is to remove higher-order random effects (e.g., X:ConditionB) and see whether that makes a difference when testing for singularity. The other is to use the Bayesian approach, e.g., the blme package to avoid singularity.

What is the preffered method and why?

I am asking this because using the first or the second one leads to different results – in the first case, I will remove the X:ConditionB random effect and won’t be able to estimate the correlation between X and X:ConditionB random effects. On the other hand, using blme allows me to keep X:ConditionB and to estimate the given correlation. I see no reason why I should even use the non-bayesian estimations and remove random effects when singular fits occur when I can estimate everything with the Bayesian approach.

Can someone explain to me the benefits and problems using either method to deal with singular fits?

Thank you.

2 Answers

This is a very interesting thread, with interesting answers and comments! Since this hasn't been brought up yet, I wanted to point out that we have very little data for each subject (as I understand it). Indeed, each subject has only two values for each of the response variable Y, categorical variable Condition and continuous variable X. In particular, we know that the two values of Condition are A and B.

If we were to pursue the two-stage regression modelling instead of the mixed effects modelling, we couldn't even fit a linear regression model to the data from a specific subject, as illustrated in the toy example below for one of the subject:

y <- c(4, 7)
condition <- c("A", "B")
condition <- factor(condition)
x <- c(0.2, 0.4)

m <- lm(y ~ condition*x)
summary(m)

The output of this subject-specific model would be:

Call:
lm(formula = y ~ condition * x)

Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!

Coefficients: (2 not defined because of singularities)
         Estimate Std. Error t value Pr(>|t|)
(Intercept)         4         NA      NA       NA
conditionB          3         NA      NA       NA
x                  NA         NA      NA       NA
conditionB:x       NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1,     Adjusted R-squared:    NaN 
F-statistic:   NaN on 1 and 0 DF,  p-value: NA

Notice that the model fit suffers from singularities, as we're trying to estimate 4 regression coefficients plus the error standard deviation using just 2 observations.

The singularities would persist even if we observed this subject twice - rather than once - under each condition. However, if we observed the subject 3 times under each condition, we would get rid of singularities:

y <- c(4, 7, 3, 5, 1, 2)
condition <- c("A", "B", "A","B","A","B")
condition <- factor(condition)
x <- c(0.2, 0.4, 0.1, 0.3, 0.3, 0.5)

m2 <- lm(y ~ condition*x)
summary(m2)

Here is the corresponding R output for this second example, from which the singularities have disappeared:

>     summary(m2)

Call:
lm(formula = y ~ condition * x)

Residuals:
    1       2       3       4       5       6 
1.3333  2.3333 -0.6667 -1.1667 -0.6667 -1.1667 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)     4.667      3.555   1.313    0.320
conditionB      6.000      7.601   0.789    0.513
x             -10.000     16.457  -0.608    0.605
conditionB:x   -5.000     23.274  -0.215    0.850

Residual standard error: 2.327 on 2 degrees of freedom
Multiple R-squared:  0.5357,    Adjusted R-squared:  -0.1607 
F-statistic: 0.7692 on 3 and 2 DF,  p-value: 0.6079

Of course, the mixed effects model does not fit unrelated, separate linear regression models for each subject - it fits "related" models whose intercepts and/or slopes deviate randomly about a typical intercept and/or slope, such that the random deviations from the typical intercept and/or typical slope follow a Normal distribution with mean zero and some unknown standard deviation.

Even so, my intuition suggests that the mixed effects model is struggling with the small amount of observations - just 2 - available for each subject. The more the model is loaded with random slopes, the more it probably struggles. I suspect that, if each subject contributed 6 observations instead of 2 (that is, 3 per condition), it would no longer struggle to accommodate all of the random slopes.

It seems to me that this could be (?) a case where the current study design does not support the complex modelling ambitions - to support those ambitions, more observations would be needed under each condition for each subject (or at least for some of the subjects?). This is just my intuition so I hope others can add their insights to my observations above. Thank you in advance!

Answered by Isabella Ghement on December 25, 2021

When you obtain a singular fit, this is often indicating that the model is overfitted – that is, the random effects structure is too complex to be supported by the data, which naturally leads to the advice to remove the most complex part of the random effects structure (usually random slopes). The benefit of this approach is that it leads to a more parsimonious model that is not over-fitted.

However, before doing anything, do you have a good reason for wanting X, Condition and their interaction, all to vary by subject in the first place ? Does the theory of how the data are generated suggest this ?

If you desire to fit the model with the maximal random effects structure, and lme4 obtains a singular fit, then fitting the same model in a Bayesian framework might very well inform you why lme4 had problems, by inspecting trace plots and how well the various parameter estimates converge. The advantage in taking the Bayesian approach is that by doing so you may uncover a problem with original model ie. the reason why the maximum random effects structure isn’t supported by the data) or it might uncover why lme4 is unable to fit the model. I have encountered situations where a Bayesian model does not converge well, unless informative priors are used – which may or may not be OK.

In short, both approaches have merit.

However, I would always start from a place where the initial model is parsimonious and informed by expert domain knowledge to determine the most appropriate random effects structure. Specifying grouping variables is relatively easy, but random slopes usually don’t have to be included. Only include them if they make sound theoretical sense AND they are supported by the data.

Edit: It is mentioned in the comments that there are sound theoretical reasons to fit the maximal random effects structure. So, a relatively easy way to proceed with an equivalent Bayesian model is to swap the call to glmer with stan_glmer from the rstanarm package – it is designed to be plug and play. It has default priors, so you can quickly get a model fitted. The package also has many tools for assessing convergence. If you find that all the parameters have converging to plausible values, then you are all good. However there can be a number of issues – for example a variance being estimated at or below zero, or an estimate that continues to drift. The mc-stan.org site has a wealth of information and a user forum.

Answered by Robert Long on December 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP