How to implement single Imputation from conditional distribution?

Question

In [*] page 264, a method of drawing a missing value from a conditional distribution $P(bf{x}_{mis}|bf{x}_{obs};theta)$ which is defined as:

I did not find any code implementation of this approach. My question is, how to implement it? Should we integrate the distribution w.r.t an assumed interval of $bf{x}_{mis}$? Otherwise, is this just an intuitive mathematical representation that should be understood but the implementation is different.

[*] Theodoridis, S., & Koutroumbas, K. “Pattern recognition. ” Fourth Edition, 9781597492720, 2008

Erik · Answer

This is just an intuitive explanation of a group of a strategy for imputing missing data.

In practice, the distribution $P(x_{mis}|x_{obs};mathbf{theta})$ is unknown and can be estimated at best. The best way to estimate this probability is use-case specific. Understanding how the training data was collected can help you in estimating/defining this conditional distribution.

In practice, we often do not try to get a good estimation. Keeping things simple and assuming all features are sampled from a normal distribution might get you started.

This is $x_{mis}$ follows $N(mu, sigma)$ where

$mu = sum_i^N frac{x_{obs,i}}{N}$ 
$sigma = sum_i^N frac{(x_{obs,i}-mu)^2}{N-1}$

. However, such assumptions are rarely realistic and will do guarantee good models.
See this.

How to implement single Imputation from conditional distribution?

One Answer

Add your own answers!

Ask a Question