TransWikia.com

Don't understand why I get an inverse ROC curve for SVM (Python)

Data Science Asked on May 5, 2021

I build an SVM classifier but get an inverse ROC curve. The AUC is only 0.08. I’ve used the same datasets to build a Logistic Regression classifier and a Decision Tree classifier, and the ROC curves for them look good.

Here are my codes for SVM:

from sklearn.svm import SVC
svm = SVC(max_iter = 12, probability = True)
svm.fit(train_x_sm, train_y_sm)
svm_test_y = svm.predict(X = test_x)
svm_roc = plot_roc_curve(svm, test_x, test_y)
plt.show()

Could anyone tell me what is wrong in my codes?

2 Answers

For any classification problem if AUC<0.5, you are not performing better than random(0.5).

Reason could be:

  • Your classifier is over-fitted on the training set and performs very poorly on the test set.
  • Your test sample might be very small.
  • Your classifier is giving you the probability that the class is -1. Thus, you get a prediction (close to) 0 for a class 1, and 1 for a class -1 prediction. If your ROC method expects positive (+1) predictions to be higher than negative (-1) ones, you get a reversed curve.

A valid strategy is to simply invert the predictions as:

invert_prob=1-prob 

Reference: ROC

Answered by prashant0598 on May 5, 2021

One potential fix is to remove max_iter = 12 (which would set it to the scikit learn default of max_iter=-1). Using such a low value can lead to bad scores as you can see from the following example:

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import plot_roc_curve
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)

model = SVC(max_iter=12, probability = True)
model.fit(X_train, y_train)

plot_roc_curve(model, X_test, y_test)

results in

ROC with max_iter=12

However, executing exactly the same code (max_iter=12 still) again gives a totally different result:

ROC max_iter=12

After removing max_iter=12 the code consistently produces higher AUCs around $0.95$ to $0.99$.

Answered by Sammy on May 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP