Stack Overflow Asked by JeniFav on November 12, 2021
I have string data I’ve pulled from the internet. I want to parse it into it’s full sentences.
So, for example:
library(RXKCD)
library(stringr)
searchXKCD("health")
getXKCD(574)
tweets <- getXKCD(574)
tweets$transcript # This is the string I want to parse.
cols <- str_extract_all(tweets$transcript, "[A-Za-z]+") # I know how to pull out the words separated, but that's not what I want to do.
# just because
freq <- table(cols)
plot(freq)
Ultimately, I want to end up with:
This is just a case of parsing the string and cutting it into the appropriate segments:
strsplit(strsplit(tweets$transcript, "(\}\})|(\{\{)")[[1]][3], "n")[[1]][-1]
#> [1] "SKEEVE37: Oh God I ate pork yesterday before I knew about swine flu!"
#> [2] "HANNELOREEC: Without duct tape I can't seal the door to keep out swine flu but I can't get duct tape without going outside! Help!"
#> [3] "PAULYSHOREFAN: How long until the swine flu reaches me here in Madagascar?"
#> [4] "CRACKMONKEY74: Swine flu is God's punishment for the ACLU and lesbians and 9"
#> [5] "11 and nanobots!"
#> [6] "TWILIGHT7531: I fell down the stairs and there was a crack and a jagged white thing is sticking out of my arm guys is this swine flu?"
#> [7] "WIGU: @UNTOWARD: No, that sounds like syphilis, not swine flu. What did you say you did with a pig?"
#> [8] "2011SENIORSRULE: My Dad said flu vaccines are linked to autism, so to be safe from swine flu I'm trying to lick an autistic kid."
Answered by Allan Cameron on November 12, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP