IterativeImputer - Returning -0 and other wierd results

Question

I am using IterativeImputer to impute my dataset.
from sklearn.experimental import enable_iterative_imputer  
from sklearn.impute import IterativeImputer

imp = IterativeImputer(random_state=0, max_iter=100, verbose=10)
imp.fit(hosp)

hosp_imputed = pd.DataFrame(imp.transform(hosp), columns=cols)

I have a boolean column "ICU" that had 8 missing values. But after imputation, it outputs very weird results. Please see the attached screenshot. Can you please let me know what am I doing wrong?

Beniamin H · Accepted Answer

Despite the fact that I'm not sure why it's that, it's seems that's normal. Please see the example from sklearn documentation: https://scikit-learn.org/stable/modules/impute.html#multivariate-feature-imputation
import numpy as np
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, random_state=0)
imp.fit([[1, 2], [3, 6], [4, 8], [np.nan, 3], [7, np.nan]])

X_test = [[np.nan, 2], [6, np.nan], [np.nan, 6]]
# the model learns that the second feature is double the first
print(np.round(imp.transform(X_test)))

They use np.round at the end to get results:
array([[ 1.,  2.],
       [ 6., 12.],
       [ 3.,  6.]])

Running this code without rounding gives results similar to yours:
array([[ 1.00007297,  2.        ],
       [ 6.        , 12.00002754],
       [ 2.99996145,  6.        ]])

Try to add np.round before casting to pd.DataFrame to see if it helps.

IterativeImputer - Returning -0 and other wierd results

One Answer

Add your own answers!

Ask a Question