Bioinformatics Asked by hepcat72 on September 10, 2020
I’m updating a galaxy tool wrapper for Entrez’s eutils suite and I’m trying to create a form with valid link selections (among other things) based on the "from" & "to" databases to reduce the size of the select list.
Einfo doesn’t return these links for the gene or nuccore databases. I’d found this resource which seemed to show all the possible links, but it it doesn’t show these links either. Yet, in working on elink
‘s acheck
feature, using some gene IDs in a test run, I saw 2 links that were not listed in either resource:
I manually checked that I can use those as link names in valid elink queries. It did return valid results (i.e. no error).
So I ran elink on the gene
database using a few random UIDs to see if I get back a comprehensive list of links (albeit possibly empty results for those links), but it also was missing these 2 links. I’m concerned that I’m not getting all of the possible links for my form interface and that it could produce valid results and that users may want to link via these links, given prior knowledge that they exist.
How do I get a comprehensive list of database links from Entrez and why doesn’t einfo return all possible links?
I've been corresponding with NLM about this issue and I finally took the time to try out their suggestion (which personally I found hard to see between the lines and which is not a discrete solution, but rather a very time-consuming manual process containing false positives because they say that to get a formal and comprehensive response to my query:
Collecting the linkname will be a difficult task, that will take time and coordinate/check with relevant groups maintaining individual databases. Your patience will be greatly apprecired. [sic]
It took me a bit to decipher their suggestion on how to manually find all links, so I will share what I've learned. There is (currently) no codified or formal means to obtain all possible link names and I infer that many links exist simply because they are utilized in the internal workings of their website. I could be wrong about that, but suffice it to say that there are many undocumented links and they change constantly.
You can get a list of filter items using each individual database's advanced search web interface, "most of which should be the linkname from that source database to other target database". (So this should result in all possible links among a series of false positives.)
Here's how you do that. Let's take the gene database as an example:
gene all (29994947)
and click to select that item The multi-select list will be repopulated and most of the items will be linknames (though you have to replace spaces with underscores in order for those linknames to work with the elink utility).
Doing this for the gene database and then scrolling through just links to nuccore (and its alias "nucleotide" - ignoring est and gss), you will find:
If you look only at the linknames in the documentation or via einfo (by supplying the gene
database and looking at similar links to nuccore), you only get these links:
So I believe that given this information and the ever-changing nature of these links, I believe I will allow the user the option to enter a linkname manually, if the set in the select list I generate via einfo does not contain the link they need.
Correct answer by hepcat72 on September 10, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP