Bioinformatics Asked by user432797 on January 6, 2021
I have this GSE dataset ( GSE104279 ) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE104279).
I want to make a table with set IDs and ftp urls to use it as a table in galaxy.org
I know that we can use ENA to get specific arrangement:
https://www.ebi.ac.uk/ena/browser/
I tried to get:
SampleID Group URL
so I used :
https://www.ebi.ac.uk/ena/browser/view/PRJNA412223
But nothing is showing.
Is there away to get these urls in the arrangement above?
You can use Entrez Direct for this as follows:
esearch -db gds -query 'GSE104279'
| esummary
| xtract -pattern DocumentSummary
-if 'entryType' -equals 'GSM'
-def 'NA' -element Accession title summary FTPLink
This will return a table with data similar to this:
GSM2580330 18_Z13_2_d5_Zika cortical organoids_Zika_5d ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM2580nnn/GSM2580330/
GSM2580329 17_Z13_2_d5_Control cortical organoids_mock_5d ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM2580nnn/GSM2580329/
To download sequence reads, you should follow links to SRA. Using Entrez Direct you can do this as follows:
esearch -db gds -query 'GSE104279'
| elink -target sra
| efetch -format runinfo
This will return a comma-delimited table containing SRA identifiers and an FTP path to the SRA data. These FTP paths won't lead you to FASTQ files though. You can pass the SRA run identifiers of the format SRR### (or ERR### or DRR###) to fastq-dump
or fasterq-dump
tools from the SRAToolkit to download data in FASTQ format.
Correct answer by vkkodali on January 6, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP