Bioinformatics Asked by init_js on August 22, 2021
Our BAM files are created according to a "lossless" alignment procedure [1] from the Broad Institute GATK
documenation and involves re-adding the unaligned/unmapped reads into an aligned BAM, using Picard’s MergeBamAlignment
.
The BAM files are produced in the end contain both the mapped and the unmapped reads. These files are then sorted with SortSam [2]- so that the sort order in the header becomes:
@HD VN:1.6 SO:coordinate
How does MarkDuplicates
handle the unmapped reads of a BAM file containing both unmapped and mapped?
Note MarkDuplicates
seems to normally take the BAM’s ordering into account, namely, it accepts arguments such as --ASSUME_SORT_ORDER X
. However it’s not specified whether reads without a position are ignored, or have to be compared with all other possible reads.
Disclaimer: I initially posted this question on the GATK forum [3], but I’m reaching out to hopefully a broader audience.
Citations:
From the Picard documentation:
DUPLICATION METRICS: Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords.
UNMAPPED_READS The total number of unmapped reads examined. (Primary, non-supplemental)
It won't alter the flags on these reads, but it will count them in the summary report it generates. You should be able to test this yourself with a small set of mapped + unmapped reads
Answered by James Hawley on August 22, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP