Data Science Asked by stk1234 on January 17, 2021
I’m just starting to learn about linear regressions and was wondering why it is that we opt to minimize the sum of squared errors. I understand the squaring helps us balance positive and negative individual errors (so say e1 = -2 and e2 = 4, we’d consider them as both regular distances of 2 and 4 respectively before squaring them), however, I wonder why we don’t deal with minimizing the absolute value rather than the squares. If you square it, e2 has a relatively higher individual contribution to minimize than e1 compared to just the absolutely values (and do we want that?). I also wonder about decimal values. For instance, say we have e1 = 0.5 and e2 = 1.05, e1 will be weighted less when squared because 0.25 is less than 0.5 and e2 will be weighted more. Lastly, there is the case of e1 = 0.5 and e2 = 0.2. E1 is further away to start, but when you square it 0.25 is compared with 0.4. Anyway, just wondering why we do sum of squares Erie minimization rather than absolute value.
Simple google search on "stats why regression not absolute difference" would give you good answers. Try it yourself!
I can quickly summarise:
You might want to read about L1 vs L2:
https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models
Correct answer by SmallChess on January 17, 2021
A question similar to this has already been asked on Cross Validated. See:
Why squared residuals instead of absolute residuals in OLS estimation? and
Why square the difference instead of taking the absolute value in standard deviation?.
The former is actually a duplicate question from the latter.
You may also benefit from answer to this post
Answered by Jon on January 17, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP