Bioinformatics Asked by revl on August 22, 2021
I’m writing a VCF parser, so I have to consider and handle all corner cases regardless of how contrived they may seem. The specification is a bit unclear about the MISSING value (‘.’) in the ALTS column:
Options are base Strings made up of the bases A,C,G,T,N,*, (case insensitive) or a MISSING value ‘.’ (no variant) or an angle-bracketed ID String (“”) or a breakend replacement string.
I’ve seen examples with a single dot in the ALTS column:
4 31789170 PTV021 G . 77 PASS .
The question is whether the following data lines are also valid:
1 12345 ID1 A .,T,. 22.88 PASS .
1 12346 ID2 G C,. 22.88 PASS .
In other words, does MISSING indicate that the entire ALTS field is missing, or does it mean that there’s a missing allele?
By extension, how do I represent the case when there’s a single dot in the ALTS field (as in the first example)? Is it an empty list (because the whole field is MISSING) or is it a list containing a MISSING value? In other words, is it []
or ["."]
?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP