Data Science Asked by Younes on September 28, 2021
In [*] page 264, a method of drawing a missing value from a conditional distribution $P(bf{x}_{mis}|bf{x}_{obs};theta)$ which is defined as:
I did not find any code implementation of this approach. My question is, how to implement it? Should we integrate the distribution w.r.t an assumed interval of $bf{x}_{mis}$? Otherwise, is this just an intuitive mathematical representation that should be understood but the implementation is different.
[*] Theodoridis, S., & Koutroumbas, K. “Pattern recognition. ” Fourth Edition, 9781597492720, 2008
This is just an intuitive explanation of a group of a strategy for imputing missing data.
In practice, the distribution $P(x_{mis}|x_{obs};mathbf{theta})$ is unknown and can be estimated at best. The best way to estimate this probability is use-case specific. Understanding how the training data was collected can help you in estimating/defining this conditional distribution.
In practice, we often do not try to get a good estimation. Keeping things simple and assuming all features are sampled from a normal distribution might get you started.
This is $x_{mis}$ follows $N(mu, sigma)$ where
. However, such assumptions are rarely realistic and will do guarantee good models. See this.
Answered by Erik on September 28, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP