Data Science Asked on May 5, 2021
For the above table, midpoints for possible split points are 22.5 and 35. I have calculated the entropy and gain for each value and 35 had the minimum Entropy and highest gain. Is it correct ?
Given High -> (-), and Low -> (+)
D<22.5 => [0+, 2-],
Entropy (D<22.5) = 0, since all the values are of the same class High.
D>22.5 => [2+, 2-],
Entropy (D>22.5) = 1, since the values are distributed equally among Low and High classes.
D<35 => [2+, 3-],
Entropy (D<35) = -[2/6 x $log_2$(2/6) + 3/6 x $log_2$(3/6)]= 0.5
D>35 => [0+, 1-],
Entropy (D>35) = 0, since all the values are of the same class High
Gain (D, Age>22.5) = 0.918 – 2/6 (0) – 4/6 (1) = 0.2513
Gain (D, Age>35) = 0.918 – 5/6 (0.5) – 1/6 (0) = 0.5103
Is that right?
For 35 split, there is an error in the denominator. It should be 5 as the total number of items for D<35 is 5
D<35 => [2+, 3-], Entropy (D<35) = -[2/5 x $log_2$(2/5) + 3/5 x $log_2$(3/5)]
Correct answer by jdsuryap on May 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP