how does splitting occur at a node in a decision-tree with non-categorical data?

Question

According to a website (:http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/) , these values are chosen randomly:

I don't think this is the case with any optimized way of creating a decision tree. In this image(different example) the value is 2.45 for the root node:

Was this value chosen randomly like explained in the website? If not and the value is not chosen randomly then how is it calculated?

Blenz · Answer

Values chosen at the node level to split the data are determined to minimize the Gini Impurity index which represents the entropy or the chaos in your data. It chooses the value that separates your classes best.

Take an example :

You have a population of 10 people. Only variable available is the age. You're predicting if the person has a certain disease. After doing some EDA , you notice that between 20-40 , both your classes , again let's assume its binary classification , are similarly present. and Exactly at the age 56 and going up, you get 4 ones and zero. Meanwhile, below the 56, you have equally distributed classes across that range.

You tree when considering which value to split the node on will calculate the Information Gain or Gini Impurity, and then splits your population across 2 leafs, where the entropy is minimized in those 2 leafs as much as possible. Again you will have a leaf with 4 ones a zero and the other will have, lets say, 3 zeros and 2 ones.

In simpler terms, you tree when splitting a population in a node, will try to make the leaves the purest they can be, and by purest, i mean containing only one class. That's how the value is decided.

Hope it makes sense.

karthikeyan mg · Answer

No, I don't think values according to which the branches are seperated are chosen at random. Instead, weighted average is calculated for each category and the category with the highest weighted average is chosen as the root node. This is also referred as Information gain
Consider this dataset

.

Consider the above picture, Here the outlook is chosen as Root node, And how is outlook chosen as root node?

First, We calculate the total entropy of the data. Lets say its 0.95. Now inorder the pick the right root node, We will find weighted averages of all the subcategories. There are 4 four categories here, So we will obtain 4 weighted entropy averages. Lets say they are 0.3, 0.2, 0.4, 0.8. Now we will subtract the induvidual weighted entropy averages from the total entropy. So we will get (0.95-0.3), (0.95-0.2), (0.95-0.4), (0.95-0.8). Among all the three which ever category has the highest value that category will be chosen as the root node. These 4 values are the information gain of each of the categories i.e Whichever category has the highest information gain, we will pick it as the root node. In our case, its the outlook category/feature .Hope it helps

Check this for more clarity

how does splitting occur at a node in a decision-tree with non-categorical data?

2 Answers

Add your own answers!

Ask a Question