TransWikia.com

Dose finding slope/intercept using the formula of m,b gives best fit line always In linear regression?

Data Science Asked on February 3, 2021

In liner regression We have to fit different lines and chose one with minimum error so What is the motive of having a formula for m,b that can give slope and intercept value in the regression line ,when it cannot give best fit line directly ?

enter image description here

1.Consider i applied the value in dataset on the formula of m,b and found the regression line yhat = 17.5835x+6 and for example just assume error calculated for this line was 3

2.Consider i fit another line randomly (i am not using the formula of m,b to find value of m,b assume m,b value for this random line was 16,3) my 2nd regression line is yhat = 16x+3and for example just assume error calculated for this line was 1.5

Linear Regression Goal : to choose best fit line that has minimum error

so my second line is better than the 1st line in this case

What is the point of having a formula which gives value for slope "m", intercept "b" when it cannot give best fit line directly ?

OR is my understanding incoorect Dose finding slope/intercept using the formula of m,b gives best line always ?

if its YES then there is no need to try mulitple lines and calculate error and choose line with min error

if its No then whats the point of having a formula for slope m,intercept b when it cannot give the best fit line . dose that mean maths/stats community need to change this forumla for slope,intercept

2 Answers

In linear regression you can choose between calculating the optimal weights using the normal equation or try to approximate the optimal weights using gradient descent.

  1. Normal equation: The optimal weights of linear regression can be calculated using: $$ w_{optim} = (X^T * X)^{-1} * X^T * y $$ The first element of $w_{optim}$ is the intercept in this case and the first column of $X = 1$.
    Lets break this down. $m$ is the number of observations/rows in the input matrix $X$, $n$ is the number of features in the input matrix $X$. So $X$ has a shape of $(m, n + 1)$ since in it's first column are only ones.
    $y$ is the column vector holding your labels. It has a shape of $(m, 1)$. $X^T$ is the transpose of $X$ and $*$ is just the dot product. With the transpose you just swap the rows and columns of the matrix. I will now just write the shapes of the matrices to show you that the shape of $w_{optim}$ will be $(n + 1, 1)$. $$ w_{optim} = ((m, n + 1)^T * (m, n + 1))^{-1} * (m, n + 1)^T * (m, 1) $$ $$ w_{optim} = ((n + 1, m) * (m, n + 1))^{-1} * (m, n + 1)^T * (m, 1) $$ $$ w_{optim} = ((n + 1, n + 1))^{-1} * (m, n + 1)^T * (m, 1) $$ $$ w_{optim} = (n + 1, n + 1) * (m, n + 1)^T * (m, 1) $$ $$ w_{optim} = (n + 1, n + 1) * (n + 1, m) * (m, 1) $$ $$ w_{optim} = (n + 1, m) * (m, 1) $$ $$ w_{optim} = (n + 1, 1) $$ However since you have to find the inverse of a matrix with the shape of $(n + 1, n + 1)$ where $n$ is the number of features in $X$, this will get too computationally expensive for most problems. If $X$ has $999$ features, for example, you have to find the inverse of a matrix with $1000 * 1000$ = $1,000,000$ entries. The O-Notation of matrix inversion is $O(n^3)$ so it has to perform roughly $1,000,000^3$ calculations.
  2. Gradient descent: This only approximates the optimal weights, however it's computationally faster when $X$ is large. I am not gonna explain it here, there are a lot of tutorials online.

I don't know the formula you posted, it's probably the normal equation for linear regression with only one feature.

Answered by Tim von Känel on February 3, 2021

The formulae you mentioned gives the coefficients of the line of best fit.The values are derived using the least squares method, where the goal is to minimize the sum of squared errors. Following is the derivation for the values of m and b.
Let the line of best fit be $$hat{y} = m*x + b$$ We then try to find the coefficients m and b which minimize the sum of squared errors between the actual value y and the observed value $hat{y}$. begin{align} SSE &= sum_{i=1}^{n}(y_{i}-hat{y_{i}})^2 &=sum_{i=1}^{n}(y_{i}-m*x_{i}-b)^2 end{align} Taking the first derivative of SSE with respect to c and equating to zero. begin{align} frac{partial SSE}{partial b} &= sum_{i=1}^{n}-2*(y_{i}-m*x_{i}-b) 0 &= sum_{i=1}^{n}-2*(y_{i}-m*x_{i}-b) end{align} Therefore we get c as $$ b = bar{y} - m*bar{x}$$ Similarly in order to find m we take the partial derivative of SSE with respect to m and equate it to zero. begin{align} frac{partial SSE}{partial m} &= sum_{i=1}^{n}-2x_{i}*(y_{i}-m*x_{i}-b) 0 &= sum_{i=1}^{n}-2x_{i}*(y_{i}-m*x_{i}-b) 0 &= sum_{i=1}^{n}x_{i}*(y_{i}-m*x_{i}-b) 0 &= sum_{i=1}^{n}x_{i}*y_{i} - sum_{i=1}^{n}m*x_{i}^2 - sum_{i=1}^{n}b*x_{i} end{align} Substituting b and solving for m we get $$m = frac{nsum xy - sum xsum y}{nsum x^2 - (sum x)^2}$$

Answered by Ankita Talwar on February 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP