Bioinformatics Asked on January 7, 2021
As my question in SO was closed and asked to be posted in this forum, I am posting it here
I am not from the bioinformatics domain. However, for the sake of analysis, I am trying to pick up certain basics related to the GT field in the VCF file.
I know we have a field called Alleles
. May I know under what circumstances GT takes a certain values and how they are called? Can you confirm my understanding?
Ref Alt GT Name
A A 0/0 Homozygous
A G 0/1 Heterozygous (does 1/0 also mean the same?) What's this 0 and 1 actually?
A [C,CA] ?? ??
?? ?? 1/1 HOM_ALT? How and why?
Can the experts here help me to fill the question marks
and also help me understand with the Ref and Alt combinations when a genotype can take a value of 0/0 or 1/0 or 1/1 or 1/2 etc and what are the names for those values? Like when it is called home_alt etc
Any simple explanation for beginner like me (with no background in bioinformatics/biology) can be helpful
You can get most of the info from this paper. See Fig. 1 and the surrounding text. Quoting from there, "GT, genotype, encodes alleles as numbers: 0 for the reference allele, 1 for the first allele listed in ALT column, 2 for the second allele listed in ALT and so on."
In your case, the reference allele, here a single nucleotide, is A. When the alternate allele is also A, the genotype GT is reference, or 0. There are 2 copies of each allele in the human genome in non-sex chromosomes chr1-chr22, hence 0/0, or homozygous reference or HOM_REF.
When ALT=G, and the GT column is 0/1, this means that you have 1 reference allele (0), and 1 alternate allele (1). This means that you have A on one copy of this locus, and G on another. The convention is write GT field in ascending order, so 0/1 rather than 1/0. This is called heterozygous, or HET.
When ALT=C,CA, the GT is probably 1/2, because there are 2 alternate alleles, and I assume we continue with the same chromosome present in 2 copies. This means there are no reference alleles here at all, only alternate alleles. It is a heterozygous genotype composed of two different ALT alleles, or HET_ALT. Note that it is not enclosed in square brackets in the vcf file format: A <tab> C,CA <tab> 1/2 ...
.
Finally, these are some examples of HOM_ALT:
A C 1/1
A G 1/1
A CA 1/1
This means that the same ALT allele (either C, or G, or CA) is present in 2 copies. There is no reference allele present. This is called homozygous alternate genotype.
In general, the name homo means the same, and hetero means different, in the context of genotypes.
REFERENCES:
Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156-2158. doi:10.1093/bioinformatics/btr330 : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/
SEE ALSO:
VCF - Variant Call Format
What Does Genotype ("0/0", "0/1" Or "1/1") In *.Vcf File Represent?
Difference between 0/0 and ./. for genotype in VCF
1/2 in VCF genotype field?
Correct answer by Timur Shtatland on January 7, 2021
0 and 1 is just a way of coding the reference (0) and alternate (1) allele. These could be A/G, C/T etc. It's just a simplified way of expressing the different alleles.
0/1
and 1/0
functionally mean the same thing (that the individual is a heterozygote) - since the genotype is unphased, the alleles aren't ordered. The /
symbol tells you the genotype is unphased. However, in practice, a heterozygous genotype is always written as 0/1
as a matter of convention. 0/0
is referred to as homozygous reference, and 1/1
as homozygous alternative.
0|1
and 1|0
do mean something different, since the pipe symbol |
tell us the order of the alleles matters.
Occasionally you will have multi allelic positions, where you have more than one alternate allele, and therefore the field will look like:
Ref Alt
A [C,GT]
The extra alt allele is given the number 2, so in this case if an individual had the genotype CT then their genotype code would be 0/2.
Full detail is given in the official VCF specification page.
Answered by user438383 on January 7, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP