Bioinformatics Asked by bioinfonext on November 25, 2020
Below are the statistics for RNA-seq mapped and unmapped paired-end reads to rice genome using reformat.sh from bbtools on bam files. It gives 77% mapped and 5% unmapped, what about the remaining 18% reads? How I can get information for all reads?
Command;
reformat.sh in=Leaf_T1_F_R10_S1_L001.bam out=Leaf_T1_F_R10_S1_L001.mapped.bam, mappedonly
Unspecified format for output Leaf_T1_F_R10_S1_L001.mapped.bam,; defaulting to fastq.
Found sambamba.
Input is being processed as unpaired
Input: 67471075 reads 9312399845 bases
Output: 52020538 reads (77.10%) 7208830835 bases (77.41%)
Time: 171.212 seconds.
Reads Processed: 67471k 394.08k reads/sec
Bases Processed: 9312m 54.39m bases/sec
For unmapped read;
Command;
reformat.sh in=Leaf_T1_F_R10_S1_L001.bam out=Leaf_T1_F_R10_S1_L001.unmapped.bam, unmappedonly
Unspecified format for output Leaf_T1_F_R10_S1_L001.unmapped.bam,; defaulting to fastq.
Found sambamba.
Input is being processed as unpaired
Input: 67471075 reads 9312399845 bases
Output: 3435326 reads (5.09%) 465456267 bases (5.00%)
Time: 142.978 seconds.
Reads Processed: 67471k 471.90k reads/sec
Bases Processed: 9312m 65.13m bases/sec
With sambamba; Can I say 94.91% reads are mapped and 5.09% are unmapped?
67471075 + 0 in total (QC-passed reads + QC-failed reads)
12015211 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
64035749 + 0 mapped (94.91%:N/A)
55455864 + 0 paired in sequencing
27727932 + 0 read1
27727932 + 0 read2
51006046 + 0 properly paired (91.98%:N/A)
51419222 + 0 with itself and mate mapped
601316 + 0 singletons (1.08%:N/A)
263816 + 0 with mate mapped to a different chr
237111 + 0 with mate mapped to a different chr (mapQ>=5)
Try samtools flagstat
or sambamba flagstat
to get similar type of information about your bam file. It may be easier to interpret. Both of these tools can be installed using conda
if needed.
EDIT:
Yes, the results of sambamba flagstat
you showed in your edited post indicate exactly what you said: 94.91%
mapped and 100% - 94.91% = 5.09%
unmapped:
64035749 + 0 mapped (94.91%:N/A)
Correct answer by Timur Shtatland on November 25, 2020
More of a comment than an answer... In my opinion the terminology is a bit confusing since the term reads actually refers to a more generic records.
Look at this test file:
samtools view test.sam | cut -f 1-4
HSQ9103:404:C6F0VANXX:1:2208:4363:50381 0 chr7 5529094
HSQ9103:404:C6F0VANXX:1:2208:4363:50381 0 chr7 5529194
HSQ9103:404:C6F0VANXX:1:2208:4363:50381 0 chr7 5529194
HSQ9103:404:C6F0VANXX:1:2208:4363:50381 0 chr7 5529194
HSQ9103:403:C6F0HANXX:8:2215:4792:20065 4 chr7 5529871
I would say that here there are 5 records (of which 4 are alignments) but only 2 reads.
samtools flagstat
reports 80% mapping rate but maybe I would say this is 50%.
samtools flagstat test.sam
5 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
4 + 0 mapped (80.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Answered by dariober on November 25, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP