Data Science Asked on March 7, 2021
I have a movie transcript without commas, punctuation, or newlines. Is there any NLP technique that can help to implement this?
This can be solved with "text segmentation". NLP libraries have code for breaking given text into :
With this, you can break text into sentences and insert . or ? for each sentence. Similarly, dependency tree will help with inserting some punctuation marks (not all).
Example (breaking text into sentences):
import spacy
nlp = spacy.load('en_core_web_sm')
text = "I was expecting a surplus of cute close-ups but Burton does surprisingly little to win us over He's never been big on treacle but a bit more warmth in this chilly movie which barely follows the outline of the 1941 original would have gone a long way"
text_sentences = nlp(text)
for sentence in text_sentences.sents:
print(sentence.text)
Output is :
I was expecting a surplus of cute close-ups but Burton does surprisingly little to win us over
and
He's never been big on treacle but a bit more warmth in this chilly movie which barely follows the outline of the 1941 original would have gone a long way
More details : https://spacy.io/usage/linguistic-features
Answered by Shamit Verma on March 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP