Data Science Asked by Smitha on July 12, 2021
I am using IterativeImputer
to impute my dataset.
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
imp = IterativeImputer(random_state=0, max_iter=100, verbose=10)
imp.fit(hosp)
hosp_imputed = pd.DataFrame(imp.transform(hosp), columns=cols)
I have a boolean column "ICU" that had 8 missing values. But after imputation, it outputs very weird results. Please see the attached screenshot. Can you please let me know what am I doing wrong?
Despite the fact that I'm not sure why it's that, it's seems that's normal. Please see the example from sklearn documentation: https://scikit-learn.org/stable/modules/impute.html#multivariate-feature-imputation
import numpy as np
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, random_state=0)
imp.fit([[1, 2], [3, 6], [4, 8], [np.nan, 3], [7, np.nan]])
X_test = [[np.nan, 2], [6, np.nan], [np.nan, 6]]
# the model learns that the second feature is double the first
print(np.round(imp.transform(X_test)))
They use np.round
at the end to get results:
array([[ 1., 2.],
[ 6., 12.],
[ 3., 6.]])
Running this code without rounding gives results similar to yours:
array([[ 1.00007297, 2. ],
[ 6. , 12.00002754],
[ 2.99996145, 6. ]])
Try to add np.round
before casting to pd.DataFrame
to see if it helps.
Correct answer by Beniamin H on July 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP