Multiclass ROC Curve using DecisionTreeClassifier

Question

I built a DecisionTreeClassifier with custom parameters to try to understand what happens modifying them and how the final model classifies the instances of the iris dataset. Now My task is to create a ROC curve taking by turn each classes as positive (this means I need to create 3 curves in my final graph). To do this I need to instantiate a OnevsRestClassifier and passing the previous classifier as parameter, so it automatically recognize the parameters I modified (such as the weights of the class). This is my current code:
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import label_binarize
from sklearn.model_selection import train_test_split
from sklearn import tree 
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import confusion_matrix
import numpy as np
import graphviz

iris = load_iris()
X = iris.data
y = iris.target

# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]

# My DecisionTreeClassifier
clf = tree.DecisionTreeClassifier(criterion="entropy",random_state=300,min_samples_leaf=5,
                                  class_weight={0:1,1:10,2:10})

np.random.seed(0)

indices = np.random.permutation(len(iris.data))
indices_training=indices[:-10]
indices_test=indices[-10:]

iris_X_train = iris.data[indices_training]
iris_y_train = iris.target[indices_training]
iris_X_test  = iris.data[indices_test]
iris_y_test  = iris.target[indices_test]

# Training
clf = clf.fit(iris_X_train, iris_y_train)

# Test
predicted_y_test = clf.predict(iris_X_test)

print(confusion_matrix(iris_y_test, predicted_y_test))

print("Predictions:")
print(predicted_y_test)
print("True classes:")
print(iris_y_test)

# Learn to predict each class against the other
classifier = OneVsRestClassifier(clf)

# Train
classifier = classifier.fit(iris_X_train, iris_y_train)

# Test
y_score = classifier.predict_proba(iris_X_test)

# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(iris_y_test, y_score)
    roc_auc[i] = auc(fpr[i], tpr[i])

My problem is that I get the error:

Class label 2 not present.

At the line: classifier = classifier.fit(iris_X_train, iris_y_train)
This is when I train the new classifier and I do not understand why. I checked on the iris dataset and there are three classes, so the label 2 should correspond to virginica, right?

Brian Spiering · Accepted Answer

The combination of class_weight and indices_test only have 10 data points results in Class label 2 not present.
Since iris dataset is perfectly balanced there is no reason to specify class_weight. Additionally, scikit-learn has train_test_split which automatically makes a split that maintains equal proportions of every class in both training and testing.

Multiclass ROC Curve using DecisionTreeClassifier

One Answer

Add your own answers!

Ask a Question