Data Science Asked by Manas Tripathi on December 4, 2020
So I recently started with Andrew Ng’s ML Course and this is the formula that Andrew lays out for calculating gradient descent on a linear model.
$$ theta_j = theta_j – alpha frac{1}{m} sum_{i=1}^m left( h_theta(x^{(i)}) – y^{(i)}right)x_j^{(i)} qquad text{simultaneously update } theta_j text{ for all } j$$
As we see, the formula asks us to the sum over all the rows in data.
However, the below code doesn’t work if I apply np.sum()
def gradientDescent(X, y, theta, alpha, num_iters):
# Initialize some useful values
m = y.shape[0] # number of training examples
# make a copy of theta, to avoid changing the original array, since numpy arrays
# are passed by reference to functions
theta = theta.copy()
J_history = [] # Use a python list to save cost in every iteration
for i in range(num_iters):
temp = np.dot(X, theta) - y
temp = np.dot(X.T, temp)
theta = theta - ((alpha / m) * np.sum(temp))
# save the cost J in every iteration
J_history.append(computeCost(X, y, theta))
return theta, J_history
On the other hand, if I get rid of the np.sum(), the formula works perfectly.
def gradientDescent(X, y, theta, alpha, num_iters):
# Initialize some useful values
m = y.shape[0] # number of training examples
# make a copy of theta, to avoid changing the original array, since numpy arrays
# are passed by reference to functions
theta = theta.copy()
J_history = [] # Use a python list to save cost in every iteration
for i in range(num_iters):
temp = np.dot(X, theta) - y
temp = np.dot(X.T, temp)
theta = theta - ((alpha / m) * temp)
# save the cost J in every iteration
J_history.append(computeCost(X, y, theta))
return theta, J_history
Can someone please explain this?
Your goal if to compute the gradients for the whole theta
vector of size p (number of variables). Your temp
is a vector also of size $p$, which contains the values of gradients of the cost function relative to each of your theta
values.
Therefore, you want to substract point-wise the two vectors (with learning rate $alpha$) to make an update, so no reason to sum the vector.
Answered by Elliot on December 4, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP