Bioinformatics Asked on August 30, 2021
I’m trying to use this europmc r libray where I have a list of pmids to look for. I tried with pubtator but its bit complicated.In Europmc i can all the annotated terms etc.
library("europepmc")
example list of PMIDS
30024784
30555165
30510081
31688884
31516032
28588019
29286103
Now what I’m doing is looking each ID using epmc_details function which is not the way i would do if i have to look for hundreds.
epmc_details(ext_id = '30510081')
I have to question how can I run the epmc_details in a loop or some other way where i can look for PMIDS one by one and save the result in a data frame.
The epmc_details is returned as a list. The structure of the list is as such
[1] "basic" "author_details" "journal_info" "ftx" "chemical" "mesh_topic" "mesh_qualifiers"
[8] "comments" "grants"
I would only like to save basic,chemical,mesh_topic,mesh_qualifiers in one data frame.
For example if my first id is this 30510081
the dataframe should have first column as my ID which is basically basic[1]
and rest of the information appended to the next columns.
such as
ID chemical mesh_qualifiers mesh_topic gene
Any suggestion or help would be really and highly appreciated
I was looking at europmc site through the browser
this was one of my query
when i highlight the keyterms i do see in the abstract itself all the keyterms are getting annotated but when i do the same query search through R I do see empty results as such why there is a difference?
$chemical
# A tibble: 0 x 0
$mesh_topic
# A tibble: 0 x 0
$mesh_qualifiers
# A tibble: 0 x 0
I found better way of getting data from pubmed using the tidypmc library.
library(tidypmc)
doc <- pmc_xml("PMC6365492")
doc
txt <- pmc_text(doc)
txt
count(txt, "section")
cap1 <- pmc_caption(doc)
filter(cap1, sentence == 1)
tab1 <- pmc_table(doc)
sapply(tab1, nrow)
tab1[[1]]
attributes(tab1[[2]])
collapse_rows(tab1, na.string="-")
library(tibble)
x <- xml_name(xml_find_all(doc, "//*"))
tibble(tag=x) %>% count("tag")
library(tidytext)
x1 <- unnest_tokens(txt, word, text) %>%
anti_join(stop_words) %>%
filter(!word %in% 1:100)
# Joining, by = "word"
#filter(x1, str_detect(section, "Case description"))
filter(x1, str_detect(section, "Results"))
count(a$word)
tbls <- pmc_table(doc)
map_int(tbls, nrow)
tbls[[1]]
collapse_rows(tbls, na.string="-")
But if i understand it can use one PMC id at a time. Again keeping my original question how can i put this in a loop to query lets say i have 100 PMCID and get it result and store in a dataframe.
After using tidypmc i found that i parse all the publication based on attributes or tags. Such as title,abstract, results etc etc.
Lets say I’m interested in the table tags information where they have metadata of patients as well as others.So if a paper contain multiple tables I would like to store each of them in a data-frame under the respective publication.Since I have multiple IDs to search and do the save as above mentioned. How to put this through a loop or can it be done without using loop ?
Any suggestion or help would be really appreciated as always.
The idea for this code is to first convert PIDs to PMCIDs, then run the tidypmc
in a loop over the PMCIDs. The only problem is that tidypmc
failed to retrieve tables from most of the IDs in your example list.
library(tidyverse)
library(tidypmc)
library(httr)
library(jsonlite)
example_pids <- c(30024784, 30555165, 30510081, 31688884, 31516032, 28588019, 29286103) %>% as.character()
#-- Convert to PMC ids
convertPIDtoPMCID <- function(pids) {
#-------- Make API request
pids4query <- paste(pids, collapse = "%0D%0A")
idconv_req <- paste0("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?ids=", pids4query, "&idtype=pmid&format=json&versions=no&showaiid=no&tool=&email=&.submit=Submit")
pids_json <- GET(idconv_req) %>% content("text") %>% fromJSON()
#-------- Get info from JSON
pid2pmc <- pids_json$records %>%
select(pmcid, pmid) %>%
as.data.frame()
rownames(pid2pmc) <- pid2pmc$pmid
pmcids <- pmcids <- pid2pmc[pids, "pmcid"]
return(pmcids)
}
example_pmcids <- convertPIDtoPMCID(example_pids)
#-- Try to get data with tidypmc
pub_tables <- lapply(example_pmcids, function(pmc_id) {
message("-- Trying ", pmc_id, "...")
doc <- tryCatch(pmc_xml(pmc_id),
error = function(e) {
message("------ Failed to recover PMCID")
return(NULL)
})
if(!is.null(doc)) {
#-- If succeed, try to get table
tables <- pmc_table(doc)
if(!is.null(tables)) {
#-- If succeed, try to get table name
table_caps <- pmc_caption(doc) %>%
filter(tag == "table")
names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ")
}
return(tables)
} else {
#-- If fail, return NA
return(NA)
}
})
names(pub_tables) <- example_pids
#-- Inspect results
pub_tables$`30555165`$`Table 1 - Patient demographic and baseline characteristics`
pub_tables$`29286103`$`Table I - Sample summary.`
Tables will require quite a bit of tidying after this. Good luck!
Correct answer by csgroen on August 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP