Bioinformatics Asked by user438383 on April 25, 2021
I have merged together 2 different .bam files in order to simulate sample contamination. So the reads can come from one of two samples, as shown by the read group info:
@RG ID:0 PL:ILLUMINA SM:LP4100018-DNA_C11_Proband PU:HGY3WDSXX:1:none
@RG ID:1 PL:ILLUMINA SM:LP4100018-DNA_C11_Proband PU:HGY3WDSXX:2:none
@RG ID:2 PL:ILLUMINA SM:LP4100018-DNA_C11_Proband PU:HGY3WDSXX:3:none
@RG ID:3 PL:ILLUMINA SM:LP4100018-DNA_C11_Proband PU:HGY3WDSXX:4:none
@RG ID:0-11EFC00B PL:ILLUMINA SM:LP4100018-DNA_E11_Proband PU:HGY3WDSXX:1:none
@RG ID:1-B8A1099 PL:ILLUMINA SM:LP4100018-DNA_E11_Proband PU:HGY3WDSXX:2:none
@RG ID:2-330086F PL:ILLUMINA SM:LP4100018-DNA_E11_Proband PU:HGY3WDSXX:3:none
@RG ID:3-7681F092 PL:ILLUMINA SM:LP4100018-DNA_E11_Proband PU:HGY3WDSXX:4:none
I’d like to check that the correct proportion of read groups originate from each sample.
Currently I am using:
samtools view example.bam | rev | cut -f 1 | rev > output.txt
, but this is not very elegant and only works because the RG field is last in the .bam.
Is there a quick way to tabulate the number of reads groups with different IDs? E.g. produce an output like:
ID:0 1000
ID:1 2000
ID:2 3000
...
A solution in samtools would be ideal, along the lines of the output produced in samtools stats
.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP