[BioC] Converting EnSeMBL Probe names into Gene Name

Fri Sep 19 16:38:54 CEST 2008

Hi Jim (and others),

Since this topic is of interest to me as well, do you have any pointer
how to construct an 'org.Hs.xx.db' library based on ENSEMBL IDs using
the direct mappings from BiomaRt?
In other words; I do know how to map ENSEMBL IDs to gene symbol, name,
GO class etc using biomart, but I would like to 'merge' these separate
files such way to get a new-style annotation db package based on ENSEMBL
IDs (thus avoiding the use of intermediate Entrez IDs). Or is this per
definition an impossible task?

Thanks,
Guido

------------------------------------------------ 
Guido Hooiveld, PhD 
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition 
Wageningen University 
Biotechnion, Bomenweg 2 
NL-6703 HD Wageningen 
the Netherlands 
tel: (+)31 317 485788 
fax: (+)31 317 483342 
internet:   http://nutrigene.4t.com
email:      guido.hooiveld at wur.nl

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch 
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
> James W. MacDonald
> Sent: 18 September 2008 14:24
> To: Gundala Viswanath
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Converting EnSeMBL Probe names into Gene Name
> 
> Another alternative is to use the org.Hs.eg.db package
> 
>  > library(org.Hs.eg.db)
> Loading required package: AnnotationDbi
> Loading required package: Biobase
> Loading required package: tools
> 
> Welcome to Bioconductor
> 
>    Vignettes contain introductory material. To view, type
>    'openVignette()'. To cite Bioconductor, see
>    'citation("Biobase")' and for packages 'citation(pkgname)'.
> 
> Loading required package: DBI
> Loading required package: RSQLite
>  > ens <- c("ENSG00000000003","ENSG00000000005","ENSG00000000419")
>  > egs <- mget(ens, revmap(org.Hs.egENSEMBL))  > egs
> $ENSG00000000003
> [1] "7105"
> 
> $ENSG00000000005
> [1] "64102"
> 
> $ENSG00000000419
> [1] "8813"
> 
>  > gns <- mget(unlist(egs), org.Hs.egSYMBOL)  > gns $`7105` 
> [1] "TSPAN6"
> 
> $`64102`
> [1] "TNMD"
> 
> $`8813`
> [1] "DPM1"
> 
> Since most BioC annotation packages are Entrez Gene-centric, 
> you will need to map via the Entrez Gene ID, whereas you can 
> do the direct mapping using biomaRt.
> 
> Best,
> 
> Jim
> 
> Sean Davis wrote:
> > On Thu, Sep 18, 2008 at 4:37 AM, Gundala Viswanath 
> <gundalav at gmail.com> wrote:
> >> Dear all,
> >>
> >> Is there a way with Bioconductor in which I can convert 
> such EnSemBL 
> >> probe names into the standard gene names?
> >>
> >> AFFX-M27830_5_at
> >> AFFX-M27830_M_at
> >> ENSG00000000003_at
> >> ENSG00000000005_at
> >> ENSG00000000419_at
> > 
> > Hi, Gundala.  In general, you do not need to cross-post to both 
> > bioconductor and R lists.
> > 
> > These are not standard Ensembl names.  You could strip off the "_at"
> > and some of them would become Ensembl gene names (the ones 
> that begin 
> > with ENSG; the others look like affy control probes).  
> Then, you could 
> > use biomart to get information about them.  See the biomart 
> vignette 
> > and help pages for assistance.
> > 
> > Sean
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> --
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-0646
> 734-936-8662
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>