TransWikia.com

How to select the split point for Continuous Attribute Age

Data Science Asked on May 5, 2021

enter image description here

For the above table, midpoints for possible split points are 22.5 and 35. I have calculated the entropy and gain for each value and 35 had the minimum Entropy and highest gain. Is it correct ?

Given High -> (-), and Low -> (+)

D<22.5 => [0+, 2-],
Entropy (D<22.5) = 0, since all the values are of the same class High.

D>22.5 => [2+, 2-],
Entropy (D>22.5) = 1, since the values are distributed equally among Low and High classes.

D<35 => [2+, 3-],
Entropy (D<35) = -[2/6 x $log_2$(2/6) + 3/6 x $log_2$⁡(3/6)]= 0.5

D>35 => [0+, 1-],
Entropy (D>35) = 0, since all the values are of the same class High

Gain (D, Age>22.5) = 0.918 – 2/6 (0) – 4/6 (1) = 0.2513

Gain (D, Age>35) = 0.918 – 5/6 (0.5) – 1/6 (0) = 0.5103

Is that right?

One Answer

For 35 split, there is an error in the denominator. It should be 5 as the total number of items for D<35 is 5

D<35 => [2+, 3-], Entropy (D<35) = -[2/5 x $log_2$(2/5) + 3/5 x $log_2$⁡(3/5)]

Correct answer by jdsuryap on May 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP