Data Science Asked on February 28, 2021
I know how Information Gain and Gini Index work in General.
I have problem figuring out how to apply these techniques in NLP and text feature extraction.
Can someone show me an example of how to implement these techniques on NLP.
Thanks
I will cite explanations from "A survey of Text Classification Algorithms" by Aggarwal and Zhai. The following and more can be read here in Chapters 2.1 (Gini Index) and 2.2 (Information Gain).
Let $p_1(w)...p_k(w)$ be the fraction of class-label presence of the $k$ different classes for the word $w$. In other words, $p_i(w)$ is the conditional probability that a document belongs to class $i$, given the fact that it contains the word $w$. Therefore, we have:
$sum_{i=1}^{k} p_i(w)=1$
Then, the gini-index for the word $w$, denoted by $G(w)$ is defined as follows
$G(w)= sum_{i=1}^{k} p_i(w)^2$
The value of the gini-index $G(w)$ always lies in the range $(1/k,1)$. Highervalues of the gini-index $G(w)$ indicate a greater discriminative power of the word $w$.
Let $P_i$ be the global probability of class $i$,and $p_i(w)$ be the probability of class $i$, given that the document contains the word $w$. Let $F(w)$ be the fraction of the documents containing the word $w$. The information gain measure $I(w)$ for a given word $w$ is defined as follows:
$I(w) = - sum_{i=1}^{k} P_i * log(P_i) + F(w) * sum_{i=1}^{k} p_i(w) * log(p_i(w)) +$ $+ (1-F(w)) * sum_{i=1}^{k} (1 - p_i(w)) * log(1-p_i(w))$
The greater the value of the information gain $I(w)$, the greater the discriminatory power of the word $w$.
Answered by Leanora on February 28, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP