Bioinformatics Asked by Equinox on June 23, 2021
This question has also been asked on Biostars and StackOverflow
I’ve been trying to code (in R) a way to convert gene accession numbers to gene names (from RNAseq data). I’ve looked at all the related questions and tried to modify my code such, but for some reason it’s still not working. Here is my code, where charg
is a character vector of the gene accession ID’s of the data set resdata
:
charg <- resdata$genes
head(charg)
library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
theBM <- getBM(attributes='ensembl_gene_id','hgnc_symbol',
filters = 'external_gene_name',
values = charg,
mart = ensembl)
resdata <- merge.data.frame(resdata, theBM, by.x="genes",by.y="ensembl_gene_id")
Here’s some output (where I’m struggling):
> head(charg)
[1] "ENSG00000261150.2" "ENSG00000164877.18" "ENSG00000120334.15"
[4] "ENSG00000100906.10" "ENSG00000182759.3" "ENSG00000124145.6"
> dim(theBM)
[1] 0 1
> head(theBM)
[1] ensembl_gene_id
<0 rows> (or 0-length row.names)
> dim(resdata)
[1] 20381 11
> resdata <- merge.data.frame(resdata, theBM, by.x="genes",by.y="ensembl_gene_id")
> dim(resdata) #after merge
[1] 0 11 #isn't correct -- just row names! where'd my genes go?
Thank you.
This is the code to get a look-up table to convert between Ensembl ID and HGNC:
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
theBM <- getBM(attributes=c('ensembl_gene_id','hgnc_symbol'),
filters = c('ensembl_gene_id'),
values = gsub("..*", "", charg),
mart = ensembl)
What Devon was posting is correct but misses a c()
around the attributes values.
For further help please provide the content of resdata
which you should always do when posting a question, since we cannot read minds. Does not work
by the way is not a proper error description.
Once you have the output do:
resdata$genes <- gsub("..*", "", resdata$genes)
merge(x = theBM,
by.x = "ensembl_gene_id",
y = resdata,
by.y = "genes")
Note that I had to go to that SE crosspost to get the content of resdata
, this is not how this goes. Please post all relevant data up front in the future otherwise your questions might get downvoted and closed. Please also avoid cross-posting. if you provide proper information you usually get a good answer in time.
Edit: Just realized you also cross-posted this to Biostars even twice. Please stop this. I closed the Biostars posts and gave my two cents on this behaviour over there.
Correct answer by ATpoint on June 23, 2021
Those aren't external_gene_name's, they're ensembl_gene_id_versions:
theBM <- getBM(attributes='ensembl_gene_id','hgnc_symbol',
filters = 'ensembl_gene_id_version',
values = charg2,
mart = ensembl)
Note that you'll get more hits if you strip the gene ID versions off:
charg2 = sapply(strsplit(charg, '.', fixed=T), function(x) x[1])
theBM = getBM(attributes='ensembl_gene_id','hgnc_symbol',
filters = 'ensembl_gene_id',
values = charg2,
mart = ensembl)
Answered by Devon Ryan on June 23, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP