[BioC] Converting EnSeMBL Probe names into Gene Name

Sean Davis sdavis2 at mail.nih.gov
Fri Sep 19 16:52:31 CEST 2008


On Fri, Sep 19, 2008 at 10:38 AM, Hooiveld, Guido <Guido.Hooiveld at wur.nl> wrote:
> Hi Jim (and others),
>
> Since this topic is of interest to me as well, do you have any pointer
> how to construct an 'org.Hs.xx.db' library based on ENSEMBL IDs using
> the direct mappings from BiomaRt?
> In other words; I do know how to map ENSEMBL IDs to gene symbol, name,
> GO class etc using biomart, but I would like to 'merge' these separate
> files such way to get a new-style annotation db package based on ENSEMBL
> IDs (thus avoiding the use of intermediate Entrez IDs). Or is this per
> definition an impossible task?

See the SQLForge documentation in the AnnotationDBI.  You can use the
list of ensembl IDs and their corresponding Entrez Gene IDs to
construct a new annotation db package.  Alternatively, you could get
the Ensembl-Entrez-gene relationship using biomart.  The final
products will be similar, but probably not identical.  Also, keep in
mind that the actual data in the org.db packages are based on NCBI
annotation even though the key would be an ensembl ID.

With all that said, the simpler way to go is to simply convert your
entire list to entrez gene id using either the org.Hs mappings or
biomart and then proceed with the Entrez gene ID as the key.

Sean


> Thanks,
> Guido
>
> ------------------------------------------------
> Guido Hooiveld, PhD
> Nutrition, Metabolism & Genomics Group
> Division of Human Nutrition
> Wageningen University
> Biotechnion, Bomenweg 2
> NL-6703 HD Wageningen
> the Netherlands
> tel: (+)31 317 485788
> fax: (+)31 317 483342
> internet:   http://nutrigene.4t.com
> email:      guido.hooiveld at wur.nl
>
>
>
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
>> James W. MacDonald
>> Sent: 18 September 2008 14:24
>> To: Gundala Viswanath
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] Converting EnSeMBL Probe names into Gene Name
>>
>> Another alternative is to use the org.Hs.eg.db package
>>
>>  > library(org.Hs.eg.db)
>> Loading required package: AnnotationDbi
>> Loading required package: Biobase
>> Loading required package: tools
>>
>> Welcome to Bioconductor
>>
>>    Vignettes contain introductory material. To view, type
>>    'openVignette()'. To cite Bioconductor, see
>>    'citation("Biobase")' and for packages 'citation(pkgname)'.
>>
>> Loading required package: DBI
>> Loading required package: RSQLite
>>  > ens <- c("ENSG00000000003","ENSG00000000005","ENSG00000000419")
>>  > egs <- mget(ens, revmap(org.Hs.egENSEMBL))  > egs
>> $ENSG00000000003
>> [1] "7105"
>>
>> $ENSG00000000005
>> [1] "64102"
>>
>> $ENSG00000000419
>> [1] "8813"
>>
>>  > gns <- mget(unlist(egs), org.Hs.egSYMBOL)  > gns $`7105`
>> [1] "TSPAN6"
>>
>> $`64102`
>> [1] "TNMD"
>>
>> $`8813`
>> [1] "DPM1"
>>
>> Since most BioC annotation packages are Entrez Gene-centric,
>> you will need to map via the Entrez Gene ID, whereas you can
>> do the direct mapping using biomaRt.
>>
>> Best,
>>
>> Jim
>>
>> Sean Davis wrote:
>> > On Thu, Sep 18, 2008 at 4:37 AM, Gundala Viswanath
>> <gundalav at gmail.com> wrote:
>> >> Dear all,
>> >>
>> >> Is there a way with Bioconductor in which I can convert
>> such EnSemBL
>> >> probe names into the standard gene names?
>> >>
>> >> AFFX-M27830_5_at
>> >> AFFX-M27830_M_at
>> >> ENSG00000000003_at
>> >> ENSG00000000005_at
>> >> ENSG00000000419_at
>> >
>> > Hi, Gundala.  In general, you do not need to cross-post to both
>> > bioconductor and R lists.
>> >
>> > These are not standard Ensembl names.  You could strip off the "_at"
>> > and some of them would become Ensembl gene names (the ones
>> that begin
>> > with ENSG; the others look like affy control probes).
>> Then, you could
>> > use biomart to get information about them.  See the biomart
>> vignette
>> > and help pages for assistance.
>> >
>> > Sean
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Hildebrandt Lab
>> 8220D MSRB III
>> 1150 W. Medical Center Drive
>> Ann Arbor MI 48109-0646
>> 734-936-8662
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list