Information Gain & Gini Index for NLP

Question

I know how Information Gain and Gini Index work in General.

I have problem figuring out how to apply these techniques in NLP and text feature extraction.

Can someone show me an example of how to implement these techniques on NLP.

Thanks

Leanora · Answer

I will cite explanations from "A survey of Text Classification Algorithms" by Aggarwal and Zhai. The following and more can be read here in Chapters 2.1 (Gini Index) and 2.2 (Information Gain).
Gini Index

Let $p_1(w)...p_k(w)$ be the fraction of class-label presence of the $k$ different classes for the word $w$. In other words, $p_i(w)$ is the conditional probability that a document belongs to class $i$, given the fact that it contains the word $w$. Therefore, we have:
$sum_{i=1}^{k} p_i(w)=1$
Then, the gini-index for the word $w$, denoted by $G(w)$ is defined as follows
$G(w)= sum_{i=1}^{k} p_i(w)^2$
The value of the gini-index $G(w)$ always lies in the range $(1/k,1)$. Highervalues of the gini-index $G(w)$ indicate a greater discriminative power of the word $w$.

Information Gain

Let $P_i$ be the global probability of class $i$,and $p_i(w)$ be the probability of class $i$, given that the document contains the word $w$.  Let $F(w)$ be the fraction of the documents containing the word $w$. The information gain measure $I(w)$ for a given word $w$ is defined as follows:
$I(w) = - sum_{i=1}^{k} P_i * log(P_i) + F(w) * sum_{i=1}^{k} p_i(w) * log(p_i(w)) +$
$+ (1-F(w)) * sum_{i=1}^{k} (1 - p_i(w)) * log(1-p_i(w))$
The greater the value of the information gain $I(w)$, the greater the discriminatory power of the word $w$.

Information Gain & Gini Index for NLP

One Answer

Gini Index

Information Gain

Add your own answers!

Ask a Question