Bioinformatics Asked by Rob John on September 1, 2020
I have a list of dbSNP rsIDs for GRCh37 and I want to convert them to the equivalent IDs in GRCh38. This is using the most recent dbSNP build (150 as of the time of this post). Is there any ID mapping available? If not, are there any tools I can use?
Did you try the liftOver tool of UCSC? You'll need a BED file with your SNPs coordinates for this.
You can also do this in R
with rtracklayer
.
library(rtracklayer)
?liftOver
Correct answer by benn on September 1, 2020
See also CrossMap: http://crossmap.sourceforge.net/
20, 21, 22, 23, 24, 30
Answered by Dan on September 1, 2020
You can assume that the overwhelming majority of rsIDs are the same between GRCh37 and GRCh38 (they're semi-stable IDs). There are, however, a number of rsIDs that are present only in GRCh37, which you can find here. Note that the format of this file is a bit strange, it's chromosome|position|ID|weight
, where position
is sometimes empty, weight
is typically 1, and ID
is the rsID without the rs
prefix.
Also, some rsIDs have a different strand. This is less likely to be an issue for you, but it's there in case you need it.
Answered by Devon Ryan on September 1, 2020
These are only the base positions that have changed between GRCh37 and GRCh38, not the IDs. For example if a study (e.g. found via pubmed) refers to a particular rsID, the genome build does not matter.
rsIDs do not depend on a reference genome. They point to a specified locus regardless of the differences in genomic assemblies. As a result, you cannot convert rsIDs of one genome build to rsIDs of another build. And there is no such mapping tool available, even if they pretend to be. rsIDs are assigned by dbSNP maintainers, and are stable. The numbers may only change on merge, where multiple numbers were assigned at the same genomic location, and the higher number gets merged into a lower number by the dbSNP maintainers, unless there is a very well-published, notable rsID mentioned in the press, in this case that merged number is considered the mentioned one, not the smaller one.
RefSNP number is the stable accession regardless of the differences in genomic assemblies.
For more information go to the “RefSNP Number Stability” section at https://www.ncbi.nlm.nih.gov/books/NBK21088/
The rsIDs may also be deleted in some cases see the following links on the explanation:
Thus, specifying the location by rsID is stable, and can be only affected by an eventual merge or delete from the dbSNP database, if any, not by a new genome build. You can only “update” the rsIDs from time to time to make sure they all are the main rsID, not those larger numbers already merged to a smaller one. It should be noted that the base positions (offsets) of the same genomic location may change between the builds of the reference genome. And they did surely change between GRCh37 and GRCh38.
If your list is relatively small, you can get most recent rsIDs for your chromosome and offset via dbSNP API. It is available at https://www.ncbi.nlm.nih.gov/snp/docs/eutils_help/
The query for the API is the same as for the “advanced search” of dbSNP at https://www.ncbi.nlm.nih.gov/snp/advanced
If you specify data in a “Base Position” line, it will look for GRCh38 base position. If you need to specify the base position for a GRCh37, use “Base Position Previous” line. You may also need to add NOT MergedRS[All Fields]
to exclude rsIDs that have already been merged.
For example, to get the rsID for a chromosome 1 GRCh37 base position 249222527, the API query URL is the following https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=snp&term=1[Chromosome]+AND+249222527[Base+Position+Previous]+NOT+MergedRS[All+Fields]
You can do queries from a script if the number of requests is relatively small. You may find a perl script, for example, that does the job.
Even though the rsID numbers can be merged or deleted, and the changes are reflected in a new dbSNP database build, given the fact that the changes in numbers do not depend on the genome build, is better to always rely to the rsID number from the latest dbSNP database build.
The changes in rsID numbers are changed due to merges or deletions between dbSNP builds relatively infrequently, given the total number of the IDs in the database.
Answered by Maxim Masiutin on September 1, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP