Cross Validated Asked by 24n8 on December 20, 2021
In supervised learning, we refer to the regressors as independent variables and response variables as dependent, but from a probabilistic standpoint, I am having trouble understanding this.
To breakdown my confusion, I think it makes sense to consider two separate cases (1) regressors are fixed / constant / deterministic (2) Regressors are random variables
(1)
Constants can also be viewed as random variables. We know from probability theory that a constant random variable is independent of any other random variable and we also know that independence is symmetric. So if $X$ is independent of $Y$, then $Y$ is independent of $X$. You can see this easily from conditional probability $P(X,Y) = P(X|Y)P(Y) = P(Y|X)P(Y)$. So if $X$ is independent of $Y$, then we have $P(X|Y) = P(X)$. So $P(Y|X)$ must be $P(Y)$.
But how does this make sense in the context of supervised learning? We assume that $Y$ is dependent on $X$, but not vice versa?
(2)
The same idea holds as the above except $X$ is no longer fixed here.
The "dependent" and "independent" terminology for the variables is unfortunate terminology, which is best avoided. Statistical dependence is always bidirectional ---i.e., if a variable is statistically dependent on another variable, then that second variable is also statistically dependent with the first variable. In a regression model the two variables are posited to have a statistical relationship. We treat the explanatory (regressor) variables $mathbf{x}$ as fixed and we model the regression function $u(mathbf{x}) = mathbb{E}(Y|mathbf{x})$, which is the conditional expected value of the response (regressand) variable $Y$. See this related question for more discussion on the unfortunate terminology.
Answered by Ben on December 20, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP