TransWikia.com

Plotting Polynomial Regression?

Data Science Asked by Greg Rosen on March 22, 2021

I’m reading through Hands-On Machine Learning with Scikit-learn and Tensorflow by Geron. I am creating a simple polynomial regression using sklearn’s PolynomialFeatures.

First, I create an X and y set using numpy random numbers with quadratic shape:

m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)

Then I plot the scatterplot distribution:

plt.plot(X, y, "b.")
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.axis([-3, 3, 0, 10])
plt.show()

Then I use PolynomialFeatures to add the 2nd degree:

poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)

Then I fit the LinearRegression:

lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
lin_reg.intercept_, lin_reg.coef_

Then I plot the same distribution with the quadratic regression line. My question is with the following code:

X_new=np.linspace(-3, 3, 100).reshape(100, 1)
X_new_poly = poly_features.transform(X_new)
y_new = lin_reg.predict(X_new_poly)
plt.plot(X, y, "b.")
plt.plot(X_new, y_new, "r-", linewidth=2, label="Predictions")
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.legend(loc="upper left", fontsize=14)
plt.axis([-3, 3, 0, 10])
plt.show()

Why do we create X_new (np.linspace(-3,3,100).reshape(100,1) and X_new_poly? Why does this not work with the X_poly that I’ve already created? (I tried plotting it with the original X_poly and it definitely does not work. It’s just oscillating lines up and down over and over. I’m just not sure why this is the case.)

One Answer

You could use X and lin_reg.predict(X_poly), but

  1. The plot will put those points in the order they appear in X, and since you're using a line connector it will appear to jump all over the place. You could fix this by using a scatterplot instead.
  2. It's vaguely disingenuous to use the training set's x-values for plotting the fitted curve; using np.linspace to get equally-spaced x-values is preferable (even if your original X was randomly generated and should fill the space reasonably well).

Answered by Ben Reiniger on March 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP