Data Science Asked by Amethyst on November 18, 2020
I have a training dataset where we have to predict "Result" based on features "A", "B", "C" and "D" using machine learning. For a few rows, the "Result" is empty (7/19612).
While for the other features I have filled the NaN values with their mean, I don’t understand whether or not to do so with the result column. Will it better to drop the rows entirely instead?
There is no correct way as dealing with nan values it depends on the dataset.But i would suggest rather than doing single imputation you could do multiple imputation using IterativeImputer
in sklearn.
As very less target variables are missing it won't make a significant impact unless the problem statement is sensitive to finding outliers like fraud detection.
Refer Iterative Imputation
Answered by prashant0598 on November 18, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP