Data Science Asked by bob2 on December 3, 2020
I am not sure how to interpret the results of my decision tree after I had used target encoding, could someone clarify? The example below doesn’t need target encoding just for explanation of my confusion here.
For instance I am trying to classify if a fruit is rotten or not given its age and fruit type. I use target encoding for the fruit column:
I then get the following decision tree with default sklearn decision tree classifier parameters:
I believe after encoding I have lost information about fruit type and I can only say that if fruit_target <= 0.841 then the fruit is rotten if smaller, else not rotten. But then how do i interpret 0.841; what does it mean?
I believe after encoding I have lost information about fruit type and I can only say that if fruit_target <= 0.841 then the fruit is rotten if smaller, else not rotten. But then how do i interpret 0.841; what does it mean?
Recall what the target encoding actually is in this example: it is the share of rotten fruits per fruit type, e.g. $75 %$ of data points with fruit == pear
are estimated to be rotten (I say "estimated" because it depends on the type of target encoding whether this an exact number or an estimate).
Accordingly, you can infer from the decision tree that a data point will be classified as rotten iff its fruit type has more than $0.841 = 84.1%$ rotten data points in the training set.
Correct answer by Sammy on December 3, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP