Cross Validated Asked on December 20, 2021
The DHARMa
package in R aims to provide scaled (quantile) residuals that, according to the DHARMa vignette,
“can be interpreted as intuitively as residuals from a linear
regression”
but for generalized linear (mixed) models.
“For example, a scaled residual value of 0.5 means that half of the
simulated data are higher than the observed value, and half of them
lower. A value of 0.99 would mean that nearly all simulated data are
lower than the observed value.”
Even though this is supposed to be intuitive, I am at a loss as how to understand this concept. A linear regression residual close to zero means that the model is a good fit for the observed value. A negative residual means that the model overestimates the effect of the independent variables in that particular case. What is the equivalent of these interpretations for quantile residuals? Is it variation around 0.5?
Syre, you say about the linear regression
A linear regression residual close to zero means that the model is a good fit for the observed value. A negative residual means that the model overestimates the effect of the independent variables in that particular case.
and I think this is where the misunderstanding starts - a linear regression where you have all residuals close to zero (close by units of the standard deviation of the regression) is actually NOT a good fit. In a perfectly fitting linear regression, you assume that residuals scatter around the mean predicted value with a normal distribution. Hence, you completely expect that some values are higher and some are lower. This is not an overestimation of the effect, but a requirement of the model.
The goal of the residual checks for the linear regression is thus not to see if residuals are close to zero, but if they scatter normally distributed around zero!
The same is true for DHARMa residuals. The only difference is that the expected distribution is uniform, not normal. I quote from the vignette:
As discussed above, for a correctly specified model we would expect
a uniform (flat) distribution of the overall residuals
uniformity in y direction if we plot against any predictor.
So, interpretation of the residuals is really like in a linear regression, only that the distribution is uniform, and that the mean expectation is at 0.5.
Addition in response to the question below:
Yes, you could look at patterns in the DHARMa residuals and attempt an interpretation of why they occur, in the same way as you might do this in a linear regression.
Note that the quote in the paper assumes the most simple linear regression, where a point that is further away from the regression line is also less likely. If you include the possibility in the model that the variance of the residuals changes (e.g. in a gls), such an interpretation of raw residuals doesn't make sense any more to define outliers or especially interesting points. The most basic solution is to divide residuals by expected variance (= Pearson residuals). The quantile residuals in DHARMa generalize this idea.
A special property of the quantile residuals is that you compare against a simulated distribution. In DHARMa, I call 0 / 1 outliers, because they are outside the simulation range. What's different compared to normal outliers is that we know they are outside, but you don't know HOW FAR they are outside (you get a value of zero, if the observed value is smaller than all simulations, regardless of how much smaller). That's why this type of outliers are extra highlighted in DHARMa.
Answered by Florian Hartig on December 20, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP