TransWikia.com

Information Gain & Gini Index for NLP

Data Science Asked on February 28, 2021

I know how Information Gain and Gini Index work in General.

I have problem figuring out how to apply these techniques in NLP and text feature extraction.

Can someone show me an example of how to implement these techniques on NLP.

Thanks

One Answer

I will cite explanations from "A survey of Text Classification Algorithms" by Aggarwal and Zhai. The following and more can be read here in Chapters 2.1 (Gini Index) and 2.2 (Information Gain).

Gini Index

Let $p_1(w)...p_k(w)$ be the fraction of class-label presence of the $k$ different classes for the word $w$. In other words, $p_i(w)$ is the conditional probability that a document belongs to class $i$, given the fact that it contains the word $w$. Therefore, we have:

$sum_{i=1}^{k} p_i(w)=1$

Then, the gini-index for the word $w$, denoted by $G(w)$ is defined as follows

$G(w)= sum_{i=1}^{k} p_i(w)^2$

The value of the gini-index $G(w)$ always lies in the range $(1/k,1)$. Highervalues of the gini-index $G(w)$ indicate a greater discriminative power of the word $w$.

Information Gain

Let $P_i$ be the global probability of class $i$,and $p_i(w)$ be the probability of class $i$, given that the document contains the word $w$. Let $F(w)$ be the fraction of the documents containing the word $w$. The information gain measure $I(w)$ for a given word $w$ is defined as follows:

$I(w) = - sum_{i=1}^{k} P_i * log(P_i) + F(w) * sum_{i=1}^{k} p_i(w) * log(p_i(w)) +$ $+ (1-F(w)) * sum_{i=1}^{k} (1 - p_i(w)) * log(1-p_i(w))$

The greater the value of the information gain $I(w)$, the greater the discriminatory power of the word $w$.

Answered by Leanora on February 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP