TransWikia.com

what should be the order of class names in sklearn tree export function (Beginner question on python sklearn)

Data Science Asked by knightrider on January 9, 2021

I am trying a simple example with sklearn decision tree. I am giving “number,is_power2,is_even” as features and the class is “is_even” (of course this is stupid)

Here is the code

from sklearn import tree

features =[[1,0,0] , [2,1,1] , [3,0,0] , [4,1,1] , [5,0,0] , [6,0,1] ,  [900,0,1] , [1001,0,0]] #val,pow2,even
labels =  ['o'     ,  'e'    , 'o'     ,  'e'    ,  'o'    ,  'e'    ,   'e'      ,  'o'] #is even

clf = tree.DecisionTreeClassifier()
clf = clf.fit(features,labels)

print (clf.predict([[203,0,0]]))

import pydot

import pydotplus
from IPython.display import Image  
dot_data = tree.export_graphviz(clf, out_file=None, 
                     feature_names=['number','pow2','even'],  
                     class_names=['o','e'],  
                     filled=True, rounded=True,  
                     special_characters=True)  
graph = pydotplus.graph_from_dot_data(dot_data)  
# Image(graph.create_png())  
graph.write_pdf("1.pdf")

The decision tree correctly identifies even and odd numbers and the predictions are working properly.

The decision tree is basically like this (in pdf)

     is_even<=0.5
        /
       /  
   label1  label2

The problem is this. The label1 is marked “o” and not “e”. However if I put class_names in export function as

class_names=['e','o']

then, the result is correct. I thought the output should be independent of class_names order.

Am I doing something wrong, or does the class_names order matter. If the latter is true, what is the right order (for an arbitrary problem)

3 Answers

The names should be given in ascending numerical order.

Saw this in the code's docs:

class_names : list of strings, bool or None, optional (default=None)
    Names of each of the target classes in ascending numerical order.
    Only relevant for classification and not supported for multi-output.
    If ``True``, shows a symbolic representation of the class name.

Correct answer by moomima on January 9, 2021

As described in the documentation. The names should be given in ascending order.

You can check the order used by the algorithm: the first box of the tree shows the counts for each class (of the target variable). The order es ascending of the class names

# class_names : list of strings, bool or None, optional (default=None)
# Names of each of the target classes in ascending numerical order.

class_names2pass = sorted(labels.unique())
print(class_names2pass)

Answered by Juan Carlos Ibañez on January 9, 2021

What you need to do is convert labels from string/char to numeric value. For instance 'o' = 0 and 'e' = 1

So your labels will look like this

labels =  [0, 1, 0, 1, 0, 1, 1, 0]

class_names should match those numbers in ascending numeric order

0 - 'o'
1 - 'e'

class_names=['o', 'e']

Answered by Vlad Bezden on January 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP