How to find all WGS assemblies accessions of a species

Bioinformatics Asked by Oren Milman on December 12, 2020

Some background

Similar to the OP of, I would like to programmatically BLAST a sequence to a local database of all WGS assemblies.
Since this isn’t feasible for the average biology lab server (correct me if I am wrong), I plan to use ncbi-acc-download to download all WGS assemblies of the species of interest (not a popular species like E. coli, so it should be feasible). Then, I will create a BLAST database for the downloaded assemblies and BLAST the sequence to it.

My question

How can I find all WGS assemblies accessions of a species?

My current plan is to search the NCBI Assembly database using Entrez and a search term such as "wgs"[Properties] AND txid1337[orgn:exp].
EDIT: IIUC, this approach might miss some WGS assemblies. See my answer.

I am worried (and thus ask for your help) this isn’t the right approach because there seem to be at least 3 other places in which assemblies can be found:

One Answer

There seem to be WGS assemblies that can't be found in NCBI Assembly database, e.g.: I guess that such assemblies also cannot be found in The assembly_summary.txt files that are described in

My current best guess is that each WGS assembly has a "WGS master record" in NCBI Nuccore database. To find all WGS assemblies of a taxon whose uid in NCBI Taxonomy database is 1337, search NCBI Nuccore database using Entrez and the search term "wgs master"[Properties] AND txid1337[orgn:exp].

Answered by Oren Milman on December 12, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP