TransWikia.com

Logistic regression score is negative

Data Science Asked on June 2, 2021

I am trying to implement logistic regression algorithm. I am using sklearn for this purpose. When I am printing the accuracy its printing negative value.

Code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import scipy
from scipy.stats import spearmanr
import sklearn
from sklearn.preprocessing import scale
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn import preprocessing

#load data and see
address=("ex2data2.txt")
student=pd.read_csv(address)
student.columns=['score1','score2','res']
#print(student.head())

#seperate input and output data
X=student.ix[:,(0,1)].values
student_data_names=['score1','score2']

y=student.ix[:,2].values
#print("all done")

#check missing values
#print(student.isnull().sum())

#check if output contains other than 0 or 1
#plt.show(sb.countplot(x='res', data=student))

#print(student.info())

X = scale(X)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)

LogReg=LogisticRegression(C=2.0)
LogReg.fit(X_train,y_train)
dat1=LogReg.predict(X_test)

print(r2_score(y_test,dat1))

Output:

-1.1818181818181817

Each time the output is different but everytime it is negative. How can I get best and correct accuracy result.

image of dataset

2 Answers

Perhaps try a different algorithm/ model or tune the parameters.

It is possible for r2_score to be negative.

As mentioned in wikipedia page of coefficient of determination

There are cases where the computational definition of $R^2$ can yield negative values, depending on the definition used. This can arise when the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data. Even if a model-fitting procedure has been used, $R^2$ may still be negative, for example when linear regression is conducted without including an intercept, or when a non-linear function is used to fit the data. In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion. Since the most general definition of the coefficient of determination is also known as the Nash–Sutcliffe model efficiency coefficient, this last notation is preferred in many fields, because denoting a goodness-of-fit indicator that can vary from $-infty$ to $1$ (i.e., it can yield negative values) with a squared letter is confusing.

Answered by Siong Thye Goh on June 2, 2021

This has already been answered in stackoverflow. This is a summary of the answers:

  • $R^2$ is bounded above by 1.0, but it is not bounded below, so it's Ok that you get negative values.
  • From your code, it seems you are invoking sklearn.metrics.r2_score correctly, i.e. r2_score(y_true, y_pred).
  • The cause may be in the data, e.g. if the mean of your test data is very different from the mean of the training data.

Some possibilities:

  • Try scaling your features to have mean 0 and variance 1.
  • Check correlation between features and target to ensure you have something useful as input and not garbage.

Answered by noe on June 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP