Error term in probabilistic interpretation of least squares update rule

Data Science Asked by Matthew Yang on June 7, 2021

I have read in Stanford’s CS229 course notes that to justify the least-squares update rule with probability, the following is assumed:

$$y^{(i)} = theta^Tx^{(i)}+epsilon^{(i)}$$

, where $epsilon^{(i)}$ represents random noise that is distributed i.i.d. w.r.t the Normal distribution.

I understand why $epsilon^{(i)}$ would make sense when $h(theta)=theta^{T}x^{(i)}$ is a trained model, but since this assumption’s eventual goal is to derive the update rule, it should make sense also when $h(theta)$ is not trained yet. However, this assumption does not make too much sense to me when the model is arbitrary and not trained at all. Is my interpretation correct? Have I missed something? If not, how do we justify $epsilon^{(i)}$ when the model is inaccurate (not trained)?

Thanks in advance.

gradient descent linear regression machine learning mathematics probability

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Joshua Engel on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Peter Machado on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?