Cross Validated Asked by Fraïssé on December 20, 2021
Given a dataset $mathcal{D} = {x_i}, i = 1, ldots, N, x_i in mathbb{R}$
In machine learning, what assumption is made as to how data are generated?
I’ve seen two basic ideas circulating around, and basically no comment on which idea is more valid:
There exists a random variable $X$ whose outcome are ${x_i}$, that is $X in {x_1, ldots, x_N}$. $X$ is distributed according to some distribution $P_X$ and these data are sampled sequentially from $P_X$ through some independent process.
There exists a random vector $X = (X_1, ldots, X_N)$, where each $X_i$ has a single realization $x_i$. $X$ has a joint distribution $P_X = P_{X_1, ldots, X_N}$ and the data is sampled once from $P_X$.
Which is generative process is more valid/common in (different models of) machine learning? Please provide a reference if possible as backup.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP