TransWikia.com

Is there public RESTful api for Gnomad?

Bioinformatics Asked by Pasted on May 14, 2021

I currently find Harvard’s RESTful API for ExAC extremely useful and I was hoping that a similar resource is available for Gnomad?

Does anyone know of a public access API for Gnomad or possibly any plans to integrate Gnomad into the Harvard API?

5 Answers

As far as I know, no but the vcf.gz files are behind a http server that supports Byte-Range, so you can use tabix or any related API:

$ tabix "https://storage.googleapis.com/gnomad-public/release-170228/vcf/exomes/gnomad.exomes.r2.0.1.sites.vcf.gz" "22:17265182-17265182"
22  17265182    .   A   T   762.04  PASS    AC=1;AF=4.78057e-06;AN=209180;BaseQRankSum=-4.59400e+00;ClippingRankSum=2.18000e+00;DP=4906893;FS=1.00270e+01;InbreedingCoeff=4.40000e-03;MQ=3.15200e+01;MQRankSum=1.40000e+00;QD=1.31400e+01;ReadPosRankSum=2.23000e-01;SOR=9.90000e-02;VQSLOD=-5.12800e+00;VQSR_culprit=MQ;GQ_HIST_ALT=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1;DP_HIST_ALT=0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0;AB_HIST_ALT=0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0;GQ_HIST_ALL=1591|589|120|301|650|589|1854|2745|1815|4297|5061|2921|10164|1008|6489|1560|7017|457|6143|52950;DP_HIST_ALL=2249|1418|6081|11707|16538|9514|28624|23829|7391|853|95|19|1|0|0|1|0|1|0|0;AB_HIST_ALL=0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0;AC_AFR=0;AC_AMR=0;AC_ASJ=0;AC_EAS=0;AC_FIN=1;AC_NFE=0;AC_OTH=0;AC_SAS=0;AC_Male=1;AC_Female=0;AN_AFR=11994;AN_AMR=31324;AN_ASJ=7806;AN_EAS=13112;AN_FIN=20076;AN_NFE=94516;AN_OTH=4656;AN_SAS=25696;AN_Male=114366;AN_Female=94814;AF_AFR=0.00000e+00;AF_AMR=0.00000e+00;AF_ASJ=0.00000e+00;AF_EAS=0.00000e+00;AF_FIN=4.98107e-05;AF_NFE=0.00000e+00;AF_OTH=0.00000e+00;AF_SAS=0.00000e+00;AF_Male=8.74386e-06;AF_Female=0.00000e+00;GC_AFR=5997,0,0;GC_AMR=15662,0,0;GC_ASJ=3903,0,0;GC_EAS=6556,0,0;GC_FIN=10037,1,0;GC_NFE=47258,0,0;GC_OTH=2328,0,0;GC_SAS=12848,0,0;GC_Male=57182,1,0;GC_Female=47407,0,0;AC_raw=1;AN_raw=216642;AF_raw=4.61591e-06;GC_raw=108320,1,0;GC=104589,1,0;Hom_AFR=0;Hom_AMR=0;Hom_ASJ=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom_SAS=0;Hom_Male=0;Hom_Female=0;Hom_raw=0;Hom=0;POPMAX=FIN;AC_POPMAX=1;AN_POPMAX=20076;AF_POPMAX=4.98107e-05;DP_MEDIAN=58;DREF_MEDIAN=5.01187e-84;GQ_MEDIAN=99;AB_MEDIAN=6.03448e-01;AS_RF=9.18451e-01;AS_FilterStatus=PASS;CSQ=T|missense_variant|MODERATE|XKR3|ENSG00000172967|Transcript|ENST00000331428|protein_coding|4/4||ENST00000331428.5:c.707T>A|ENSP00000331704.5:p.Phe236Tyr|810|707|236|F/Y|tTc/tAc||1||-1||SNV|1|HGNC|28778|YES|||CCDS42975.1|ENSP00000331704|Q5GH77||UPI000013EFAE||deleterious(0)|benign(0.055)|hmmpanther:PTHR14297&hmmpanther:PTHR14297:SF7&Pfam_domain:PF09815||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000672806|TF_binding_site|||||||||||1||||SNV|1||||||||||||||||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00001729562|CTCF_binding_site|||||||||||1||||SNV|1||||||||||||||||||||||||||||||||||||||||||||

UPDATE: 2019: the current server for gnomad doesn't support Byte-Range requests.

Correct answer by Pierre on May 14, 2021

You can browse gnomAD variants with ClinGen Allele Registry (there is API spec available).

Answered by user1690 on May 14, 2021

The new gnomAD site (as of August 2019) says no, no API yet:

How do I query a batch of variants? Do you have an API?

We currently do not have a way to submit batch queries on the browser, but we are actively working on developing an API for ExAC/gnomAD. If you would like to learn about GraphQL, which we will use to work with the API, an overview can be found at https://graphql.org. You can also obtain information on all variants from the VCFs and Hail Tables available on our downloads page.

But, the web interface itself already makes POST requests to https://gnomad.broadinstitute.org/api to send and receive JSON/GraphQL. So, you can make those same queries programmatically right now, even if it's not officially a public API.

Here's an example in Python to get some basic info on variants for a particular gene. This way you get simple nested Python objects to work with:

  { 'consequence': 'intron_variant',
    'pos': 6928442,
    'rsid': 'rs782435448',
    'variant_id': '12-6928442-C-A'},
  { 'consequence': 'splice_region_variant',
    'pos': 6928462,
    'rsid': None,
    'variant_id': '12-6928462-C-A'},
  { 'consequence': 'splice_acceptor_variant',
    'pos': 6928464,
    'rsid': 'rs782577109',
    'variant_id': '12-6928464-G-A'},
  { 'consequence': 'missense_variant',
    'pos': 6928466,
    'rsid': 'rs782208003',
    'variant_id': '12-6928466-C-T'},

(I found it useful to go this route because then the full metadata visible in the gnomAD web interface is then available, including the per-variant details like allele counts by population. I couldn't find this information in the other APIs described here.)

Answered by Jesse on May 14, 2021

I faced same issue recently, I found those link and python script:

gnomAD GraphQL api https://gnomad.broadinstitute.org/api It works great but it is a kind of different query language. Please check here for the docs: https://graphql.org/learn/queries/

gnomAD Python Api https://github.com/furkanmtorun/gnomad_python_api

Answered by John t_eckerd on May 14, 2021

I found Jesse's code quite usefull ! For those who try to reproduce it, you should now add the reference genome ID, such as :

#!/usr/bin/env python

import requests
# import pprint
# prettyprint = pprint.PrettyPrinter(indent=2).pprint

def fetch(jsondata, url="https://gnomad.broadinstitute.org/api"):
    # The server gives a generic error message if the content type isn't
    # explicitly set
    headers = {"Content-Type": "application/json"}
    response = requests.post(url, json=jsondata, headers=headers)
    json = response.json()
    if "errors" in json:
        raise Exception(str(json["errors"]))
    return json

def get_variant_list(gene_id, dataset="gnomad_r2_1"):
    # Note that this is GraphQL, not JSON.
    fmt_graphql = """ 
    {
        gene(gene_id: "%s", reference_genome: GRCh38) {
          variants(dataset: %s) {
            consequence
            pos
            rsid
            variant_id: variantId
          }
        }
      }
    """
    # This part will be JSON encoded, but with the GraphQL part left as a
    # glob of text.
    req_variantlist = {
        "query": fmt_graphql % (gene_id, dataset),
        "variables": {"withFriends": False}
        }
    response = fetch(req_variantlist)
    return response["data"]["gene"]["variants"]

prettyprint(get_variant_list("ENSG00000010610"))

Answered by BretSnoop on May 14, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP