KNN Imputation utilize mean or mode?

Question

In my current project, I am doing KNN imputation with K = 5 and I am using sklearn.impute.KNNImputer. I have a mix of continuous and nominal variables(encoded as 0/1 or ordinal ones that have been encoded as 0/0.25/0.5/0.75/1 etc). However, the docs say "Each sample’s missing values are imputed using the mean value from n_neighbors nearest neighbors found in the training set." Because of this, I am getting in-between values like 0.4 for nominal attributes. Is there any way to override this to change from mean to mode for nominal columns?
Also, I looked at missingpy and fancyimpute but they both seem to be using mean as well~

Brian Spiering · Answer

By default scikit-learn's KNNImputer uses Euclidean distance metric for searching neighbors and mean for imputing values.
If you have a combination of continuous and nominal variables, you should pass in a different distance metric.
If you want to use another imputation function than mean, you'll have to implement that yourself.

KNN Imputation utilize mean or mode?

One Answer

Add your own answers!

Ask a Question