Data Science Asked by plytheman on September 26, 2021
This may be a dumb question but I can’t figure out how to actually get the values imputed using StatsModels MICE back into my data. I have a dataframe (dfLocal) with hourly temperature records for five neighboring stations (LOC1:LOC5) over many years and I’d like to impute the missing data for any given site. Following the examples I have:
imp = mice.MICEData(dfLocal)
fml = 'LOC1 ~ LOC2 + LOC3 + LOC4 + LOC5'
mice = mice.MICE(fml, sm.OLS, imp)
results = mice.fit(10, 10)
print(results.summary())
dfLocal.dropna(axis=0, how='all', inplace=True)
imp.data = imp.data.set_index(dfLocal.index)
# In this case I only want to fill one specific set of missing data
# hence gap_start and gap_end
dfLocal.loc[gapStart:gapEnd, 'LOC1'] = imp.data[fillSite]
My understanding of MICE is broadly that missing values are imputed multiple times and then combined to find the best value from the many. The only way I’ve found to actually get any numbers out of the above code is with imp.data
but I’m afraid that might just be one of the individual imputations before they’re combined? All I can seem to get from fitting the model (results
), though, is the summary?
I’m far from a statistician (and not much of a programmer either) so I’ve been reading through the code for mice.MICE and other resources on general MICE applications, but I’d appreciate any guidance on this as I can’t find much about using statsmodels’ MICE online. Normally I’d post some data on Gist but the full set is a bit large. That said, I’ll upload it if ya’ll think it would help.
Thanks!
MICE does generate several datasets, but it does not then combine these datasets. Rather, it fits your model on each of those datasets and combines those models. If you really need an imputed dataset, you could just choose one or combine them in whatever way makes sense for your problem (or you might be better off with another method):
Now, for the statsmodels
implementation, imp.data
only keeps track of the latest imputed set [1]; you can loop through updates rather than using fit
to get all of the datasets as in an example in [2].
Answered by Ben Reiniger on September 26, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP