Cross Validated Asked on December 25, 2021
I have a linear model with response variable $textbf{y}$ and explanatory variable matrix $textbf{X}$ for which coefficients $textbf{b}$ are physically meaningful and worth estimating:
begin{equation}
textbf{y} = textbf{X}textbf{b} + textbf{e}
end{equation}
However, the relationship between $textbf{y}$ and $textbf{X}$ is not strictly linear over the entire domain, but could be better modeled as such within several subgroups $g$ (and coefficients are more meaningful if defined for each subgroup):
begin{equation}
textbf{y}_g = textbf{X}_gtextbf{b}_g + textbf{e}_g
end{equation}
If we arrive at the subgroups through cluster analysis of the explanatory variables $textbf{X}$ or some prior segregation based on similarity, the subgroup models could increasingly suffer from multicollinearity.
I imagine this is not uncommon in multilevel or hierarchical modeling – if the goal is not just to create a predictive model is there a general approach to parameter estimation in such situations?
Since you say :
However, the relationship between $textbf{y}$ and $textbf{X}$ is not strictly linear over the entire domain, but could be better modeled as such within several subgroups $g$ (and coefficients are more meaningful if defined for each subgroup):
it sounds to me very much like a mixed effects model (of which multilevel and hierarchical models are special cases) with random slopes for $textbf{X}$ in subgroups $g$. This will have the general form:
$$y = textbf{X}beta+textbf{Z}u+e$$
where $beta$ is a vector of fixed effects, $X$ and $Z$ are model matrices for the fixed effects and random effects respectively and $u$ and $e$ are vectors of random effects such that $E(u) = E(e) = 0$
In R you could fit such a model with, for example:
y = func(y ~ X1 + X2 + (X1 + X2 | g ), ...)
where func
will be the relevant function from whatever package you choose, eg lme4
or GLMMAdaptive
. Note that some packages, eg nlme
use different syntax. This will estimate fixed effects (slopes) and random slopes for X1
and X2
and random intercepts for each group. If you do not want random intercepts - ie. you wish to allow the slopes to vary by group, but all pass through the same point on the y axis, then you would use:
y = func(y ~ X1 + X2 + (X1 + X2 + 0 | g ), ...)
Answered by Robert Long on December 25, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP