Bioinformatics Asked by trouselife on January 24, 2021
Sorry if this is a stupid question I am new to analysing targeted hybrid capture data.
I am analysing my HTS data from a targeted hybrid capture enrichment method and subsequent sequencing on a NextSeq.
Looking at my bam files, I see there are reads that are overlapping into my non target regions.
For example – my .bed file has two target regions in chr17:
However I still have reads overlapping between 7578739-7578918 (following example is position 7578850):
Quite a lot of these reads don’t hit any part of the targeted regions in my bed file. Are these mistakes? How have they been pulled down for sequencing when they aren’t in the target regions of the baits?
It strongly depends on exactly what your protocol is. Note that there are a good number of tools for QCing capture data (here, here).
Off-target reads are a standard observation in capture experiments; recall that it is target enrichment, not target perfect purification, as commenter swbarnes2 suggests.
The real question is whether the proportion of those off-target reads is too high for your application, or otherwise suggests problems. Those depend on details of the technique that you will likely have to figure out yourself.
Here are some tables from the first of those QC papers (supplement), showing on-target proportions for various capture samples (sorry, it would have been a bit of effort to get text of the tables rather than images):
These data suggest that on-target reads are 60%-80% of mapped reads. Note that they also lose a lot of reads in mapping, so we cannot exclude the possibility of mis-mapped garbage reads contributing to off-target mapping.
Correct answer by Maximilian Press on January 24, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP