Data Science Asked by nisrine hammout on June 14, 2021
I am having a hard time evaluating my model of imputation.
I used an iterative imputer model to fill in the missing values in all four columns.
For the model on the iterative imputer, I am using a Random forest model, here is my code for imputing:
imp_mean = IterativeImputer(estimator=RandomForestRegressor(), random_state=0)
imp_mean.fit(my_data)
my_data_filled= pd.DataFrame(imp_mean.transform(my_data))
my_data_filled.head()
My problem is how can I evaluate my model. How can I know if the filled values are right?
I used a describe function before and after filling in the missing values it gives me nearly the same mean and std. Also, the correlation between variables stayed nearly the same with slight changes.
When imputing data, one is looking not to modify the true distribution of your data. So a way to test how good your imputation was is to make a test to contrast the true distribution of every feature that has been imputed vs the true (via KS test for example) distribution of the feature (prior imputing) if you can sate with a level. of confidence that your imputation preserved the distribution that would be a way.
Another way would be in case you have a supervised task, you can compare the performance of your model on each imputation technique. Like in the below's image from Scikit-learn documentation:
Correct answer by Julio Jesus on June 14, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP