Bioinformatics Asked on August 22, 2021
How can I programmatically obtain ftp links to RNA seq fastq files in ENA? Here’s an example of a link that I would be interested in obtaining:
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR824/000/SRR8240860/SRR8240860_1.fastq.gz
In particular, is there some tool that, given the BioProject ID (here, PRJNA506829), would be able give me all ftp links for the runs in the project, or would I need to write a web scraper to do it?
pysradb can fetch ENA/SRA fastq/bam links (if available):
$ pysradb metadata SRR8240860 --detailed
run_accession study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument total_spots total_size run_total_spots run_total_bases run_alias sra_url_alt1 sra_url_alt2 sra_url experiment_alias source_name strain/genotype developmental stage ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2
SRR8240860 SRP170618 SRX5059122 GSM3487689: fer-15 Day1 rep1; Caenorhabditis elegans; RNA-Seq GSM3487689: fer-15 Day1 rep1; Caenorhabditis elegans; RNA-Seq 6239 Caenorhabditis elegans RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS4075216 N/A HiSeq X Ten 40494415 4755049874 40494415 12148324500 GSM3487689_r1 gs://sra-pub-src-3/SRR8240860/RRA0719_R2.fq.gz s3://sra-pub-src-3/SRR8240860/RRA0719_R2.fq.gz https://sra-downloadb.st-va.ncbi.nlm.nih.gov/sos2/sra-pub-run-3/SRR8240860/SRR8240860.1 GSM3487689 whole worm fer-15(b26ts) Adult day 1 N/A http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR824/000/SRR8240860/SRR8240860_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR824/000/SRR8240860/SRR8240860_2.fastq.gz N/A [email protected]:vol1/fastq/SRR824/000/SRR8240860/SRR8240860_1.fastq.gz [email protected]:vol1/fastq/SRR824/000/SRR8240860/SRR8240860_2.fastq.gz
Answered by rightskewed on August 22, 2021
They have an API you can interact with.
If you need to get files for only a few different projects: Search for your accession ID in the browser (leading to https://www.ebi.ac.uk/ena/browser/view/PRJNA506829). Filter the show selected columns to only fastq_ftp, click download tsv to get the list of ftp links.
Answered by Pallie on August 22, 2021
Here are some command lines I used for that purpose in bash.
Simply prepare a text file containing each accession number (SRR/ERR) you want and create a for loop. Here I used prozilla to speed up downloads but you may use wget
either.
for index in $(cat list_of_accessions) ; do
if [ ${#index} -eq 9 ]; then
proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/${index}/${index}_1.fastq.gz
proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/${index}/${index}_2.fastq.gz
else
if [ ${#index} -eq 10 ]; then
proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/00${index:9:9}/${index}/${index}_1.fastq.gz
proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/00${index:9:9}/${index}/${index}_2.fastq.gz
else
if [ ${#index} -eq 11 ]; then
proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/0${index:9:10}/${index}/${index}_1.fastq.gz
proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/0${index:9:10}/${index}/${index}_2.fastq.gz
else
proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/${index:9:11}/${index}/${index}_1.fastq.gz
proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/${index:9:11}/${index}/${index}_2.fastq.gz
fi
fi
fi
done
```
Answered by thomas duge de bernonville on August 22, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP