TransWikia.com

KNN Imputation utilize mean or mode?

Data Science Asked on April 25, 2021

In my current project, I am doing KNN imputation with K = 5 and I am using sklearn.impute.KNNImputer. I have a mix of continuous and nominal variables(encoded as 0/1 or ordinal ones that have been encoded as 0/0.25/0.5/0.75/1 etc). However, the docs say "Each sample’s missing values are imputed using the mean value from n_neighbors nearest neighbors found in the training set." Because of this, I am getting in-between values like 0.4 for nominal attributes. Is there any way to override this to change from mean to mode for nominal columns?

Also, I looked at missingpy and fancyimpute but they both seem to be using mean as well~

One Answer

By default scikit-learn's KNNImputer uses Euclidean distance metric for searching neighbors and mean for imputing values.

If you have a combination of continuous and nominal variables, you should pass in a different distance metric.

If you want to use another imputation function than mean, you'll have to implement that yourself.

Answered by Brian Spiering on April 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP