Bioinformatics Asked on December 6, 2020
I want to download the original BAM files that the authors had uploaded to SRA. Normally, I would just use sam-dump
, but the files are having issues that seem related to this issue. Since according to the entry, AWS S3 also hosts the original BAM files, I thought I could download these directly.
NCBI documentation implies that I can’t download this directly, but I can freely copy to other AWS locations within the region. To this end, I created my own S3 bucket (mm-mneuron
) and am now trying to copy from the SRA bucket to mine. Here’s what I try:
import boto3
import botocore
s3 = boto3.resource('s3')
bam_file = {
'Bucket': 'sra-pub-src-6',
'Key': 'SRR5253957/RPI25_0.bam'
}
my_bucket = s3.Bucket('mm-mneuron')
my_bucket.copy(bam_file, 'RPI25_0.bam')
This fails with:
botocore.exceptions.ClientError: An error occurred (403) when calling
the HeadObject operation: Forbidden
That is, it sounds like I can’t access the SRA bucket. I’ve tested downloading and uploading to my bucket, so I know I have write permissions. Not sure what else to try here.
How can I access the SRA data on S3?
A member of the SRA submission staff pointed out that using
prefetch --type all SRR5253957
will download the original files. In this case, it means running the above within an EC2 instance colocated with the S3 bucket (so, us-east-1) and having installed and configured SRA Toolkit to work from AWS (as per this documentation).
Unfortunately, the particular files I am concerned with are not currently accessible, but this should generically work in most situations.
Answered by merv on December 6, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP