TransWikia.com

Whether to replace NaN values in result column

Data Science Asked by Amethyst on November 18, 2020

I have a training dataset where we have to predict "Result" based on features "A", "B", "C" and "D" using machine learning. For a few rows, the "Result" is empty (7/19612).

While for the other features I have filled the NaN values with their mean, I don’t understand whether or not to do so with the result column. Will it better to drop the rows entirely instead?

One Answer

There is no correct way as dealing with nan values it depends on the dataset.But i would suggest rather than doing single imputation you could do multiple imputation using IterativeImputer in sklearn.

As very less target variables are missing it won't make a significant impact unless the problem statement is sensitive to finding outliers like fraud detection.

Refer Iterative Imputation

Answered by prashant0598 on November 18, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP