Cross Validated Asked by Gil77 on November 2, 2021
I have 2 questions about Multiple imputation (MI) in the assessment of the prognostic performance of a test. This test acts as a predictor of a specific outcome, 3 years in the future. I have 26 % of missing data in my outcome.
First, I performed the MI with a predictive matrix (an $N times (n+1)$ matrix) that contains all the input variables in my database and the desired outcome of the predictor test. Here, $N$ is the number of observations and $n$ the number of input variables. I need to know whether this approach is sound. Can it be that multiple imputation needs to be performed in a $N times n$ predictive matrix without the predictor outcome?
Second, after MI, the imputed data are obtained. How can I pool these data in one imputation data matrix? Can anyone share a script in R that performs this? Or are there other ways to analyze the imputed data, for this application?
Thanks in advance
It's quite proper to include the outcome variable among those used for imputations, although if only outcome values are missing there might be limited value to doing so. If there are missing values in the predictors too, you need the outcome values to help impute the missing predictors. See the discussion on this page for example.
Second, you don't combine all of the imputations into a single imputed data matrix. Instead you run your analysis on each of the imputed data sets separately, then use Rubin's rules to combine the estimates in a way that takes both variability in the modeling and variability in the imputation into account. There is software that will do that for you, including the mice
package in R. Stef van Buuren provides a useful web site with information on multiple imputation in general and that package in particular.
Answered by EdM on November 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP