Bioinformatics Asked on April 4, 2021
I wondered how would I be able to obtain variant based on chrom and location range.
Optimally, this function would be like:
import vcf
vcf_reader = vcf.Reader(filename=some_file_name)
vcf_read = vcf.variants_between(CHROM, START, END)
Does anyone how to do so?
I think vcf_reader.fetch
is what you want. You have to index the file first though.
If starting from regular text VCF you can run:
bgzip example.vcf
tabix example.vcf.gz
Then the Python code is almost the same as yours:
import vcf
vcf_reader = vcf.Reader(filename="example.vcf.gz")
vcf_read = vcf_reader.fetch("NC_023663.1", 7803980, 7804000)
for record in vcf_read:
print(record)
For this example VCF you get:
Record(CHROM=NC_023663.1, POS=7803981, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803982, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803983, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803984, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803985, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803986, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803987, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803988, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803989, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803990, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803991, REF=A, ALT=[G, <*>])
Record(CHROM=NC_023663.1, POS=7803992, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803993, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803994, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803995, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803996, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803997, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803998, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803999, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7804000, REF=T, ALT=[<*>])
API reference: https://pyvcf.readthedocs.io/en/latest/API.html#vcf.Reader.fetch
Note the "zero-based, half-open coordinate system" it describes (which is why 7803980 in the second Python argument becomes POS=7803981 from the VCF but then it stops at POS=7804000).
Correct answer by Jesse on April 4, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP