Data Science Asked by knightrider on January 9, 2021
I am trying a simple example with sklearn decision tree. I am giving “number,is_power2,is_even” as features and the class is “is_even” (of course this is stupid)
Here is the code
from sklearn import tree
features =[[1,0,0] , [2,1,1] , [3,0,0] , [4,1,1] , [5,0,0] , [6,0,1] , [900,0,1] , [1001,0,0]] #val,pow2,even
labels = ['o' , 'e' , 'o' , 'e' , 'o' , 'e' , 'e' , 'o'] #is even
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features,labels)
print (clf.predict([[203,0,0]]))
import pydot
import pydotplus
from IPython.display import Image
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=['number','pow2','even'],
class_names=['o','e'],
filled=True, rounded=True,
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
# Image(graph.create_png())
graph.write_pdf("1.pdf")
The decision tree correctly identifies even and odd numbers and the predictions are working properly.
The decision tree is basically like this (in pdf)
is_even<=0.5
/
/
label1 label2
The problem is this. The label1 is marked “o” and not “e”. However if I put class_names in export function as
class_names=['e','o']
then, the result is correct. I thought the output should be independent of class_names order.
Am I doing something wrong, or does the class_names order matter. If the latter is true, what is the right order (for an arbitrary problem)
The names should be given in ascending numerical order.
Saw this in the code's docs:
class_names : list of strings, bool or None, optional (default=None)
Names of each of the target classes in ascending numerical order.
Only relevant for classification and not supported for multi-output.
If ``True``, shows a symbolic representation of the class name.
Correct answer by moomima on January 9, 2021
As described in the documentation. The names should be given in ascending order.
You can check the order used by the algorithm: the first box of the tree shows the counts for each class (of the target variable). The order es ascending of the class names
# class_names : list of strings, bool or None, optional (default=None)
# Names of each of the target classes in ascending numerical order.
class_names2pass = sorted(labels.unique())
print(class_names2pass)
Answered by Juan Carlos Ibañez on January 9, 2021
What you need to do is convert labels from string/char to numeric value. For instance 'o' = 0 and 'e' = 1
So your labels will look like this
labels = [0, 1, 0, 1, 0, 1, 1, 0]
class_names should match those numbers in ascending numeric order
0 - 'o'
1 - 'e'
class_names=['o', 'e']
Answered by Vlad Bezden on January 9, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP