[BioC] mis-matched gene symbols and entrez ID in biomaRt

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Sep 7 05:42:57 CEST 2011


Hi,

On Tue, Sep 6, 2011 at 11:24 PM, Wendy Qiao <wendy2.qiao at gmail.com> wrote:
> Hi all,
>
> I am converting the HGNC symbols from an Illumina human array to Entrez ID
> using biomaRt. I found that there are some gene symbols are matched to many
> Entrez IDs, and vice versa. I am wondering if how to solve the problem, so
> one gene symbol is only matched to one Entrez ID. Or is there any other
> package that I can use for matching gene symbols to Entrez IDs. Thank you in
> advance.
>
> Wendy
>
> =====
> In the following example, BAGE2, 3, 4 and 5 are matched to 85316 and 85317
> which are the Entrez IDs of BAGE5 and BAGE4, respectively.

Not sure why that's happening (out of curiosity, is ensembl_mart_51 an
older version of the db(?) -- I hardly ever use biomart, it seems)

Anyway, seems like using the org.Hs.eg.db package would be ok:

R> library(org.Hs.eg.db)
R> mget(paste("BAGE", 2:5, sep=""), revmap(org.Hs.egSYMBOL), ifnotfound=NA)
$BAGE2
[1] "85319"

$BAGE3
[1] "85318"

$BAGE4
[1] "85317"

$BAGE5
[1] "85316"

... and you get the added bonus of not having to fire your query "over
the wire".

HTH,
-steve


>
> library('biomaRt')
> ensembl=useMart("ensembl_mart_51",dataset="hsapiens_gene_ensembl",archive=TRUE)
> Entrez<-getBM(attributes=c("hgnc_symbol","entrezgene"),filters="hgnc_symbol",values=GeneList,mart=ensembl)
> # class(GeneList) = factor
>
> Entrez[1:20,]
>   hgnc_symbol entrezgene
> 1        ZFP62      92379
> 2     C9orf169     375791
> 3       FAM72D     653573
> 4         HMX1         NA
> 5         HMX1       3166
> 6        ZFP62         NA
> 7        RSPO4     343637
> 8        DOC2B       8447
> 9      C8orf42     157695
> 10       TTTY8         NA
> 11       A26C3         NA
> 12       BAGE5      85316
> 13       BAGE4      85316
> 14       BAGE3      85316
> 15       BAGE2      85316
> 16       BAGE5      85317
> 17       BAGE4      85317
> 18       BAGE3      85317
> 19       BAGE2      85317
> 20        NBR1       4077
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list