Data Science Asked by rahs on August 1, 2020
Consider a (somewhat nonsensical) sentence – “I see saw a see saw”
The observed bi-grams would be:
“I see”
“see saw”
“saw a”
and,
“a see”.
My aim is to smoothen out the probability mass of the bi-gram probabilities by using Good-Turing smoothing. For this, I need to find the count of unseen bi-grams, i.e., bi-grams with a frequency count of 0.
How do I do this?
1) Would this be a list of all bi-grams formed by using 2 non-consecutive words? For example, “I saw”, “saw saw”, “a I”, etc.?
2) Would repetitions of the same word be included as bi-grams? Eg. “I I”, “see see”, etc.?
I just remembered that we create a table with all possible words as the header of each row and of each column. As a result, the list of all bi-grams would be all possible bi-grams formed by concatenating any 2 words.
Answered by rahs on August 1, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP