Bioinformatics Asked by SPPearce on May 27, 2021
I have a bam file from paired-end sequencing, in which each R1 has the RX
tag assigned (via UMItools group
) denoting a Unique Molecular Identifier (UMI).
I want to call molecular consensus reads using fgbio
but fgbio GroupReadsByUmi
gives me an error, because it expects both R1 and R2 to have the RX
tag set.
I would like a tool that will go through my bam file and assign each R2 the same RX
tag as the R1.
The best thing would be to figure out how to do this in UMItools. I don't know that tool at all, but it seems that it is supposed to be able to deal with / tag both reads in paired data. e.g. did you use --paired
flag, etc.
I would suggest investigating that route first before you look at the next part of my answer.
Per Devon Ryan's comment suggestion I would suggest a workflow similar to this:
samtools sort -n
) to get the R1s immediately following the R2ssamtools view
and possibly samtools sort
back to a bam file of the same type that you started with.My pysam is a little rusty but something like this might work
import pysam
bam = pysam.AlignmentFile("bam", "rb")
# print header
print(bam.header)
tag = None
for read in bam:
r2 = False
if tag:
read.assign_umi_tag() # assign to read- not sure where/how
r2 = True
else:
tag = read.lookup_umi_tag() # i don't know where this tag is in bam format
print(read)
if r2:
tag = None
I haven't tested that in any way, caveat emptor.
I am not personally familiar with UMItools and how it assigns tags so can't suggest anything more specific, sorry.
Answered by Maximilian Press on May 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP