Data Science Asked by Matthew Yang on June 7, 2021
I have read in Stanford’s CS229 course notes that to justify the least-squares update rule with probability, the following is assumed:
$$y^{(i)} = theta^Tx^{(i)}+epsilon^{(i)}$$
, where $epsilon^{(i)}$ represents random noise that is distributed i.i.d. w.r.t the Normal distribution.
I understand why $epsilon^{(i)}$ would make sense when $h(theta)=theta^{T}x^{(i)}$ is a trained model, but since this assumption’s eventual goal is to derive the update rule, it should make sense also when $h(theta)$ is not trained yet. However, this assumption does not make too much sense to me when the model is arbitrary and not trained at all. Is my interpretation correct? Have I missed something? If not, how do we justify $epsilon^{(i)}$ when the model is inaccurate (not trained)?
Thanks in advance.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP