Data Science Asked by Jemar Villareal on November 8, 2020
I am new to NLP and I would like to ask how can I extract sentences from the text based on keywords that I have using Python. I created a list of keywords which will be used to extract sentences from the document.
If this will be a simple tokenization problem in which you will loop the list through the tokens, how can I capture synonyms or related words?
For example:
Keyword: Internal business
Sentence: You can only use this software for your business only.
Keyword: Confidentiality
Sentence: Information will be kept as secure as possible.
I actually implemented text categorization using TF-IDF, but with small dataset and large number of keywords. I don’t think this will work to.
Thanks in advance.
Is it possible to apply pre-trained models like word2vec?
Is it also possible to create a custom model that will fit my concerns?
The ideal way to get the related sentences would be to try to get a sentence vector for the sentences you want to categorise and then compare the vectors of your predefined keywords with the obtained sentence vectors . You can get the sentence vectors by just averaging the word vectors of the words present in the sentences . Once the sentence vectors are obtained , you can use cosine similarity to compare the keyword vectors and the sentence vectors . The one with the max cosine similarity will give you the result .
Answered by Gyan Ranjan on November 8, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP