The Why Behind Sum of Squared Errors in a Linear Regression

Question

I'm just starting to learn about linear regressions and was wondering why it is that we opt to minimize the sum of squared errors. I understand the squaring helps us balance positive and negative individual errors (so say e1 = -2 and e2 = 4, we'd consider them as both regular distances of 2 and 4 respectively before squaring them), however, I wonder why we don't deal with minimizing the absolute value rather than the squares. If you square it, e2 has a relatively higher individual contribution to minimize than e1 compared to just the absolutely values (and do we want that?). I also wonder about decimal values. For instance, say we have e1 = 0.5 and e2 = 1.05, e1 will be weighted less when squared because 0.25 is less than 0.5 and e2 will be weighted more.   Lastly, there is the case of e1 = 0.5 and e2 = 0.2. E1 is further away to start, but when you square it 0.25 is compared with 0.4. Anyway, just wondering why we do sum of squares Erie minimization rather than absolute value.

SmallChess · Accepted Answer

Simple google search on "stats why regression not absolute difference" would give you good answers. Try it yourself!

I can quickly summarise:

Your regression parameters are solutions to the maximum likelihood optimisation. That involves derivative, but absolute difference doesn't have a derivative at zero. There's no unique solution for least absolute regression.
Least absolute regression is an alternative to the regular sum of squares regression, commonly classified as one of the robust statistical methods.
You'd prefer least absolute regression if you care about outliers, otherwise the regular regression is generally better.

You might want to read about L1 vs L2:

https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models

Jon · Answer

A question similar to this has already been asked on Cross Validated. See:

Why squared residuals instead of absolute residuals in OLS estimation? and 
Why square the difference instead of taking the absolute value in standard deviation?.

The former is actually a duplicate question from the latter.

You may also benefit from answer to this post

square things in statistics- generalized rationale

The Why Behind Sum of Squared Errors in a Linear Regression

2 Answers

Add your own answers!

Ask a Question