IterativeImputer Evaluation

Question

I am having a hard time evaluating my model of imputation.
I used an iterative imputer model to fill in the missing values in all four columns.
For the model on the iterative imputer, I am using a Random forest model, here is my code for imputing:
imp_mean = IterativeImputer(estimator=RandomForestRegressor(), random_state=0)
imp_mean.fit(my_data)
my_data_filled=  pd.DataFrame(imp_mean.transform(my_data))
my_data_filled.head()

My problem is how can I evaluate my model. How can I know if the filled values are right?
I used a describe function before and after filling in the missing values it gives me nearly the same mean and std. Also, the correlation between variables stayed nearly the same with slight changes.

Julio Jesus · Accepted Answer

When imputing data, one is looking not to modify the true distribution of your data. So a way to test how good your imputation was is to make a test to contrast the true distribution of every feature that has been imputed vs the true (via KS test for example) distribution of the feature (prior imputing) if you can sate with a level. of confidence that your imputation preserved the distribution that would be a way.
Another way would be in case you have a supervised task, you can compare the performance of your model on each imputation technique. Like in the below's image from Scikit-learn documentation:

IterativeImputer Evaluation

One Answer

Add your own answers!

Ask a Question