Data Science Asked on June 2, 2021
I am trying to implement logistic regression algorithm. I am using sklearn
for this purpose. When I am printing the accuracy its printing negative value.
Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import scipy
from scipy.stats import spearmanr
import sklearn
from sklearn.preprocessing import scale
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn import preprocessing
#load data and see
address=("ex2data2.txt")
student=pd.read_csv(address)
student.columns=['score1','score2','res']
#print(student.head())
#seperate input and output data
X=student.ix[:,(0,1)].values
student_data_names=['score1','score2']
y=student.ix[:,2].values
#print("all done")
#check missing values
#print(student.isnull().sum())
#check if output contains other than 0 or 1
#plt.show(sb.countplot(x='res', data=student))
#print(student.info())
X = scale(X)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)
LogReg=LogisticRegression(C=2.0)
LogReg.fit(X_train,y_train)
dat1=LogReg.predict(X_test)
print(r2_score(y_test,dat1))
Output:
-1.1818181818181817
Each time the output is different but everytime it is negative. How can I get best and correct accuracy result.
Perhaps try a different algorithm/ model or tune the parameters.
It is possible for r2_score to be negative.
As mentioned in wikipedia page of coefficient of determination
There are cases where the computational definition of $R^2$ can yield negative values, depending on the definition used. This can arise when the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data. Even if a model-fitting procedure has been used, $R^2$ may still be negative, for example when linear regression is conducted without including an intercept, or when a non-linear function is used to fit the data. In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion. Since the most general definition of the coefficient of determination is also known as the Nash–Sutcliffe model efficiency coefficient, this last notation is preferred in many fields, because denoting a goodness-of-fit indicator that can vary from $-infty$ to $1$ (i.e., it can yield negative values) with a squared letter is confusing.
Answered by Siong Thye Goh on June 2, 2021
This has already been answered in stackoverflow. This is a summary of the answers:
sklearn.metrics.r2_score
correctly, i.e. r2_score(y_true, y_pred)
.Some possibilities:
Answered by noe on June 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP