Data Science Asked by fpalka on May 7, 2021
Lately I have been playing with drawing non-linear decision boundaries using the Logistic Regression Classifier. I used this notebook to learn how to create a proper plot. Author presents a really nice way to create a plot with decision boundary on it. He adds polynomial features to the original dataset to be able to draw non-linear shapes. Then draws few plots for different values of degree
param (that polynomial features function works exactly like this one from sklearn).
I followed this notebook on my environment. Instead of using minimize
function from scipy.optimize
package to find best weights (like in the notebook), I used my own Logistic Regression implementation with gradient descent optimization algorithm. For polynomial features with degree = 1
and degree = 2
my plots look exactly like in the notebook. But for degree = 6
there is a difference. In notebook decision boundary is really curvy, which i would also excpect, because generating more features leads to better fitting to training dataset. In my algorithm there is really small difference between degree = 2
and degree = 6
.
Here is a comparsion, where first one is plot from the notebook and second one is mine.
I also ran weights optimization with minimize
function and noticed that obtained weights are so different from ones which I obtained using my own implementation.
Weigths obtained from minimize function:
[ 35.04544314 44.02433844 69.15673318 -343.69261527
-197.89262344 -183.96784381 -295.14107156 -620.36490083
-509.82244752 -327.88923523 1092.78139854 1267.14765298
1754.90950905 899.49504769 436.09827632 470.00768776
1233.10546815 1818.64194606 1926.06287183 1129.25103089
463.40430072 -1140.10610687 -2016.5976485 -3456.09451404
-3477.54092932 -3247.55983249 -1544.01162994 -510.04081232]
Weights obtained from my own implementation:
[ 3.88423676 2.61252164 4.41490058 -5.49636578 -5.70598669 -6.88741771
1.17688239 -1.44447375 0.24250327 -1.28316639 -4.71695206 1.22093586
-3.2154096 -2.74892981 -4.42193179 -0.68483587 -0.28642677 2.34231314
-2.35831624 -2.0624521 1.26929726 -4.5863475 1.65033913 -0.82767549
1.64820792 -2.03474736 -2.00457816 -0.50033884]
It can be noticed that weights from minimize
function have much much higher values. Why there is so much difference between these two methods? Why weithts obtained from gradient descent algorithm are that different? I thought that adding high power polynomial terms will always make decision boundary more curvy. Is there a mistake in my thinking?
I also attach my Logistic Regression implementation below. For that degree = 6
plot i used alpha=1, epoch=3000
.
import numpy as np
class LogisticRegressionModel:
def __init__(self, alpha=0.05, epoch=100):
self.__alpha = alpha
self.__epoch = epoch
self.__weights = []
self.__errors = []
def learn_and_fit(self, X, Y):
self.__weights = np.random.rand(X.shape[1], )
m = X.shape[0]
for _ in range(self.__epoch):
J = self.cost_function(X, Y)
activations = self.activation_function(X)
gradient = 1 / m * np.dot(X.T, activations - Y)
self.__weights = self.__weights - self.__alpha * gradient
self.__errors.append(J)
def cost_function(self, X, Y):
m = X.shape[0]
activations = self.activation_function(X)
J = np.dot(-Y.T, np.log(activations)) - np.dot((1 - Y).T, np.log(1 - activations))
return 1 / m * J
def activation_function(self, X):
Z = X.dot(self.__weights)
activations = self.sigmoid(Z)
return activations
def sigmoid(self, Z):
return 1 / (1 + np.exp(-Z))
def predict(self, X):
activations = self.activation_function(X)
return np.where(activations < 0.5, 0, 1).flatten()
def get_errors(self):
return self.__errors
def get_weights(self):
return self.__weights
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP