Dose finding slope/intercept using the formula of m,b gives best fit line always In linear regression?

Question

In liner regression We have to fit different lines and chose one with minimum error so What is the motive of having a formula for m,b that can give slope and intercept value in the regression line ,when it cannot give best fit line directly ?

1.Consider i applied the value in dataset on the  formula of m,b and found the regression line  yhat = 17.5835x+6 and for example just assume   error calculated for this line was  3
2.Consider i fit another line randomly (i am not using the formula of m,b to find value of m,b  assume m,b value for this random line was 16,3) my 2nd regression line is yhat = 16x+3and  for example just assume   error calculated for this line was 1.5
Linear Regression Goal : to choose best fit line that has minimum error
so my second line is better than the 1st line in this case
What is the point of having a formula which gives value for slope "m", intercept "b" when it cannot give best fit line directly ?
OR is my understanding incoorect Dose finding slope/intercept using the formula of m,b  gives best line always ?
if its YES   then there is no need to try mulitple lines and calculate error  and choose line with min error
if its No  then whats the point of having a formula for slope m,intercept b when it cannot give the best fit line . dose that mean maths/stats community need to change this forumla for slope,intercept

Tim von Känel · Answer

In linear regression you can choose between calculating the optimal weights using the normal equation or try to approximate the optimal weights using gradient descent.

Normal equation:
The optimal weights of linear regression can be calculated using:
$$  w_{optim} = (X^T * X)^{-1} * X^T * y   $$
The first element of $w_{optim}$ is the intercept in this case and the first column of $X = 1$.
Lets break this down. $m$ is the number of observations/rows in the input matrix $X$, $n$ is the number of features in the input matrix $X$.    So $X$ has a shape of $(m, n + 1)$ since in it's first column are only ones.
$y$ is the column vector holding your labels. It has a shape of $(m, 1)$.
$X^T$ is the transpose of $X$ and $*$ is just the dot product. With the transpose you just swap the rows and columns of the matrix.
I will now just write the shapes of the matrices to show you that the shape
of $w_{optim}$ will be $(n + 1, 1)$.
$$ w_{optim} = ((m, n  + 1)^T * (m, n + 1))^{-1} * (m, n + 1)^T * (m, 1) $$
$$ w_{optim} = ((n + 1, m) * (m, n + 1))^{-1} * (m, n + 1)^T * (m, 1) $$
$$ w_{optim} = ((n + 1, n + 1))^{-1} * (m, n + 1)^T * (m, 1) $$
$$ w_{optim} = (n + 1, n + 1) * (m, n + 1)^T * (m, 1) $$
$$ w_{optim} = (n + 1, n + 1) * (n + 1, m) * (m, 1) $$
$$ w_{optim} = (n + 1, m) * (m, 1) $$
$$ w_{optim} = (n + 1, 1) $$
However since you have to find the inverse of a matrix with the shape of $(n + 1, n + 1)$ where $n$ is the number of features in $X$, this will get too computationally expensive for most problems. If $X$ has $999$ features, for example, you have to find the inverse of a matrix with $1000 * 1000$ = $1,000,000$ entries. The O-Notation of matrix inversion is $O(n^3)$ so it has to perform roughly $1,000,000^3$ calculations.
Gradient descent: This only approximates the optimal weights, however it's computationally faster when $X$ is large. I am not gonna explain it here, there are a lot of tutorials online.

I don't know the formula you posted, it's probably the normal equation for linear regression with only one feature.

Ankita Talwar · Answer

The formulae you mentioned gives the coefficients of the line of best fit.The values are derived using the least squares method, where the goal is to minimize the sum of squared errors. Following is the derivation for the values of m and b.
Let the line of best fit be $$hat{y} = m*x + b$$ We then  try to find the coefficients m and b which minimize the sum of squared errors between the actual value y and the observed value $hat{y}$. begin{align}
SSE &= sum_{i=1}^{n}(y_{i}-hat{y_{i}})^2 
&=sum_{i=1}^{n}(y_{i}-m*x_{i}-b)^2
end{align}
Taking the first derivative of SSE with respect to c and equating to zero.
begin{align}
frac{partial SSE}{partial b} &= sum_{i=1}^{n}-2*(y_{i}-m*x_{i}-b)
0 &= sum_{i=1}^{n}-2*(y_{i}-m*x_{i}-b)
end{align}
Therefore we get c as $$ b = bar{y} - m*bar{x}$$ Similarly in order to find m we take the partial derivative of SSE with respect to m and equate it to zero.
begin{align}
frac{partial SSE}{partial m} &= sum_{i=1}^{n}-2x_{i}*(y_{i}-m*x_{i}-b)
0 &= sum_{i=1}^{n}-2x_{i}*(y_{i}-m*x_{i}-b)
0 &= sum_{i=1}^{n}x_{i}*(y_{i}-m*x_{i}-b)
0 &= sum_{i=1}^{n}x_{i}*y_{i} - sum_{i=1}^{n}m*x_{i}^2 - sum_{i=1}^{n}b*x_{i}
end{align}
Substituting b and solving for m we get $$m = frac{nsum xy - sum xsum y}{nsum x^2 - (sum x)^2}$$

Dose finding slope/intercept using the formula of m,b gives best fit line always In linear regression?

2 Answers

Add your own answers!

Ask a Question