Data Science Asked by Syed Mohsin Karim on July 21, 2021
This problem is related to natural language processing out of a specific language context.
Combination of words having their own self meaning, like North America
, South Asia
, Albert Einstein
etc.
Some Problems I faced:
South
and Africa
both words having different probability scores and in the case of n-gramization, it will be generated irrelevant words probabilities too. If I ignore fewer probability words it will make some sense, but the question still left. how many n_gram words should be generated and will increase also processing time and corpus length and how to dealt with first, middle, and last name. because many irrelevant words make even high probabilities list, in general, it is
.I want a solution to that problem, in an optimized manner to dealing this type of issue.
Note: I am working on language, don’t have a rich corpus like the English language.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP