TransWikia.com

How to implement single Imputation from conditional distribution?

Data Science Asked by Younes on September 28, 2021

In [*] page 264, a method of drawing a missing value from a conditional distribution $P(bf{x}_{mis}|bf{x}_{obs};theta)$ which is defined as:
enter image description here

I did not find any code implementation of this approach. My question is, how to implement it? Should we integrate the distribution w.r.t an assumed interval of $bf{x}_{mis}$? Otherwise, is this just an intuitive mathematical representation that should be understood but the implementation is different.

[*] Theodoridis, S., & Koutroumbas, K. “Pattern recognition. ” Fourth Edition, 9781597492720, 2008

One Answer

This is just an intuitive explanation of a group of a strategy for imputing missing data.

In practice, the distribution $P(x_{mis}|x_{obs};mathbf{theta})$ is unknown and can be estimated at best. The best way to estimate this probability is use-case specific. Understanding how the training data was collected can help you in estimating/defining this conditional distribution.

In practice, we often do not try to get a good estimation. Keeping things simple and assuming all features are sampled from a normal distribution might get you started.

This is $x_{mis}$ follows $N(mu, sigma)$ where

  • $mu = sum_i^N frac{x_{obs,i}}{N}$
  • $sigma = sum_i^N frac{(x_{obs,i}-mu)^2}{N-1}$

. However, such assumptions are rarely realistic and will do guarantee good models. See this.

Answered by Erik on September 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP