Bioinformatics Asked on June 17, 2021
I have a .vcf file
with this header
##startTime=Fri Mar 29 16:46:32 2019
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
1 54586 . T C . PASS DP=39;MQ=50.55;MQ0=0;NT=ref;QSS=48;QSS_NT=48;ReadPosRankSum=1.92;SGT=TT->CT;SNVSB=0.00;SOMATIC;SomaticEVS=10.83;TQSS=1;TQSS_NT=1 AU:CU:DP:FDP:GU:SDP:SUBDP:TU 0,0:0,0:20:0:0,0:0:0:20,20 0,0:6,6:18:0:0,0:0:0:12,13
1 103241 . C T . PASS DP=120;MQ=24.94;MQ0=35;NT=ref;QSS=47;QSS_NT=47;ReadPosRankSum=2.09;SGT=CC->CT;SNVSB=0.00;SOMATIC;SomaticEVS=9.44;TQSS=2;TQSS_NT=2 AU:CU:DP:FDP:GU:SDP:SUBDP:TU 0,1:32,47:33:1:0,0:0:0:0,5 0,
The "DP" field in the vcf shows the depth of the individual samples; So in this file, the first locus has the following format fields:
AU:CU:DP:FDP:GU:SDP:SUBDP:TU 0,0:0,0:20:0:0,0:0:0:20,20 0,0:6,6:18:0:0,0:0:0:12,13
So according to this (DP field of normal and tumor samples), normal sample has a depth of 20 and tumor sample has a depth of 18.
So how I could extract the read depth for all loci as described for the first position? The desired output would be like this [note that the VCF is taken from my own data but the table is my desired format that I don’t know how to get that from my own data. chr have been added manually because my reference genome is hg19]:
Sample Type CHROM POS REF ALT Tumor_Depth Normal_Depth
CHC2432T SNV chr1 102961055 G A 64 62
CHC2432T SNV chr1 105492588 A T 66 73
CHC2432T SNV chr1 108628724 C T 45 54
CHC2432T SNV chr1 109692113 G T 53 29
CHC2432T SNV chr1 109692114 G T 53 31
CHC2432T SNV chr1 120676701 T C 48 87
To extract the DP fields from a VCF file, you could use a tool like bcftools query
:
Extracts fields from VCF or BCF files and outputs them in user-defined format.
You could start from something like this:
bcftools query -Hf 'CHC2432Tt%TYPEt%CHROMt%POSt%REFt%ALT[t%DP]n' file.vcf
Answered by Jukka Matilainen on June 17, 2021
You can do the extraction part with the GATK tool VariantsToTable, as described here:
https://gatk.broadinstitute.org/hc/en-us/articles/360041414592-VariantsToTable
The usage example from that doc:
gatk VariantsToTable
-V input.vcf
-F CHROM -F POS -F TYPE -GF AD
-O output.table
would produce a file that looks like:
CHROM POS TYPE HSCX1010N.AD HSCX1010T.AD
1 31782997 SNP 77,0 53,4
1 40125052 SNP 97,0 92,7
1 65068538 SNP 49,0 35,4
1 111146235 SNP 69,1 77,4
So you might still need to reorder columns etc but that should allow you to at least get the values out in a tabular format that will be easier to work with.
Answered by Geraldine_VdAuwera on June 17, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP