Bioinformatics Asked on October 17, 2020
I am working with short-read whole-genome sequences from the NCBI’s SRA. I have aligned and sorted all of my short-read sequences and am attempting to index each sequence into .bai format using samtools index, but am running into a couple of errors.
I unpacked the original .sra files in the following manner:
fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRR6509138.sra
I then aligned each fastq paired-end read, removed duplicate reads, and converted each to bam format like so:
bwa mem xiphophorus_birchmanni_10x_12Sep2018_yDAA6.fasta SRR6509136_1.fastq SRR6509136_2.fastq | samblaster -e -r| samtools view -Sb - > blasted_SRR6509136.bam
Some of my files did not come as paired-end reads. For those files, I used the –ignoreUnmated flag on samblaster.
I then sorted each bam file like so:
samtools sort blasted_SRR6649368.bam -o sorted_SRR6649368.bam -n
I am attempting to obtain a bai file for each bam using the following command:
samtools index sorted_SRR6649368.bam sorted_SRR6649368.bam.bai
This is the error I run into for the unpaired reads:
[E::hts_idx_push] Chromosome blocks not continuous
[E::sam_index] Read 'HWI-ST387:164:D0CJWACXX:4:1101:1129:13264' with ref_name='ScdB1pO_646;HRSCAF=880', ref_length=33092979, flags=16, pos=23260108 cannot be indexed
samtools index: failed to create index for "sorted_SRR791885.bam": No such file or directory
This is the error I run into for the paired reads:
[E::hts_idx_push] Unsorted positions on sequence #79: 11159020 followed by 11158717
samtools index: failed to create index for "sorted_SRR6649368.bam"
Can anyone help me figure out why this is happening?
samtools sort blasted_SRR6649368.bam -o sorted_SRR6649368.bam -n
These error messages indicate that the reads are not sorted by coordinate — in particular, that the reads mapped to ScdB1pO_646;HRSCAF=880 are not all together, and that the reads at positions 11159020 < 11158717 are not sorted by position on their chromosome.
This is because samtools sort -n
has been used to sort the reads by name instead. Remove -n
to sort by position, which is what is needed to prepare a BAM file for indexing with samtools index
.
Correct answer by John Marshall on October 17, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP