TransWikia.com

Mapping statistics from bam file using bbtools and sambamba

Bioinformatics Asked by bioinfonext on November 25, 2020

Below are the statistics for RNA-seq mapped and unmapped paired-end reads to rice genome using reformat.sh from bbtools on bam files. It gives 77% mapped and 5% unmapped, what about the remaining 18% reads? How I can get information for all reads?

Command;
reformat.sh in=Leaf_T1_F_R10_S1_L001.bam out=Leaf_T1_F_R10_S1_L001.mapped.bam, mappedonly


Unspecified format for output Leaf_T1_F_R10_S1_L001.mapped.bam,; defaulting to fastq.
Found sambamba.
Input is being processed as unpaired
Input:                   67471075 reads           9312399845 bases
Output:                 52020538 reads (77.10%) 7208830835 bases (77.41%)

Time:                         171.212 seconds.
Reads Processed:      67471k 394.08k reads/sec
Bases Processed:       9312m 54.39m bases/sec

For unmapped read;
Command;
reformat.sh in=Leaf_T1_F_R10_S1_L001.bam out=Leaf_T1_F_R10_S1_L001.unmapped.bam, unmappedonly

Unspecified format for output Leaf_T1_F_R10_S1_L001.unmapped.bam,; defaulting to fastq.
Found sambamba.
Input is being processed as unpaired
Input:                   67471075 reads           9312399845 bases
Output:                 3435326 reads (5.09%) 465456267 bases (5.00%)

Time:                         142.978 seconds.
Reads Processed:      67471k 471.90k reads/sec
Bases Processed:       9312m 65.13m bases/sec

With sambamba; Can I say 94.91% reads are mapped and 5.09% are unmapped?

67471075 + 0 in total (QC-passed reads + QC-failed reads)
12015211 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
64035749 + 0 mapped (94.91%:N/A)
55455864 + 0 paired in sequencing
27727932 + 0 read1
27727932 + 0 read2
51006046 + 0 properly paired (91.98%:N/A)
51419222 + 0 with itself and mate mapped
601316 + 0 singletons (1.08%:N/A)
263816 + 0 with mate mapped to a different chr
237111 + 0 with mate mapped to a different chr (mapQ>=5)

2 Answers

Try samtools flagstat or sambamba flagstat to get similar type of information about your bam file. It may be easier to interpret. Both of these tools can be installed using conda if needed.

EDIT:

Yes, the results of sambamba flagstat you showed in your edited post indicate exactly what you said: 94.91% mapped and 100% - 94.91% = 5.09% unmapped:

64035749 + 0 mapped (94.91%:N/A)

Correct answer by Timur Shtatland on November 25, 2020

More of a comment than an answer... In my opinion the terminology is a bit confusing since the term reads actually refers to a more generic records.

Look at this test file:

samtools view test.sam | cut -f 1-4
HSQ9103:404:C6F0VANXX:1:2208:4363:50381 0   chr7    5529094
HSQ9103:404:C6F0VANXX:1:2208:4363:50381 0   chr7    5529194
HSQ9103:404:C6F0VANXX:1:2208:4363:50381 0   chr7    5529194
HSQ9103:404:C6F0VANXX:1:2208:4363:50381 0   chr7    5529194
HSQ9103:403:C6F0HANXX:8:2215:4792:20065 4   chr7    5529871

I would say that here there are 5 records (of which 4 are alignments) but only 2 reads.

samtools flagstat reports 80% mapping rate but maybe I would say this is 50%.

samtools flagstat test.sam 
5 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
4 + 0 mapped (80.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Answered by dariober on November 25, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP