[BioC] Entrez Gene ID to Probe Set Name

Marc Carlson mcarlson at fhcrc.org
Fri Oct 24 19:42:43 CEST 2008


Hi Monnie,

This is pretty easy once you know about the revmap() function. 
Here is a quick example:

library(hgu95av2.db)
mget("1557", revmap(hgu95av2ENTREZID))


Also, if you want to know more, you might want to look at the
AnnotationDbi vignette:

http://www.bioconductor.org/packages/2.4/bioc/html/AnnotationDbi.html


  Marc




McGee, Monnie wrote:
> Here is the previous query with a more descriptive subject.
>
>
> -----Original Message-----
> From: McGee, Monnie
> Sent: Thu 10/23/2008 11:14 AM
> To: bioconductor at stat.math.ethz.ch
> Subject: RE: Bioconductor Digest, Vol 68, Issue 23
>  
> Dear List,
>
> Is there an elegant way to obtain the name of a probe set from an Affymetrix platform (doesn't matter which one) corresponding to a given ENTREZ gene ID?  It seems that it is fairly simple to obtain the entrez ID if you have a probe set, but the reverse problem seems non-trival -at least it is to me.  
>
> There's no particular reason I need to know.  I just want to know if it's possible.
>
> Thanks!
> Monnie
>
> Monnie McGee, Ph.D.
> Associate Professor
> Department of Statistical Science
> Southern Methodist University
> Ph: 214-768-2462
> Fax: 214-768-4035
>
>
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch on behalf of bioconductor-request at stat.math.ethz.ch
> Sent: Thu 10/23/2008 5:00 AM
> To: bioconductor at stat.math.ethz.ch
> Subject: Bioconductor Digest, Vol 68, Issue 23
>  
> Send Bioconductor mailing list submissions to
> 	bioconductor at stat.math.ethz.ch
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://stat.ethz.ch/mailman/listinfo/bioconductor
> or, via email, send a message with subject or body 'help' to
> 	bioconductor-request at stat.math.ethz.ch
>
> You can reach the person managing the list at
> 	bioconductor-owner at stat.math.ethz.ch
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioconductor digest..."
>
>
> Today's Topics:
>
>    1. GOstat: listing genes from hyperGTest (Tim Smith)
>    2. export toptables into Genespring (Pemmasani, Kalyani)
>    3. Re: Limma contrasts question (James W. MacDonald)
>    4. Re: GOstat: listing genes from hyperGTest (James W. MacDonald)
>    5. Re: Limma contrasts question (Daniel Brewer)
>    6. quality assessment and preprocessing for tiling array-based
>       CGH data (Leon Yee)
>    7. GOstats and org.EcK12.eg.db (Robert Castelo)
>    8. Re: quality assessment and preprocessing for tiling
>       array-based CGH data (Sean Davis)
>    9. Re: GOstat: listing genes from hyperGTest (Tim Smith)
>   10. Re: quality assessment and preprocessing for tiling
>       array-based CGH data (Leon Yee)
>   11. Re: Beadarray and illumina methylation arrays (Mark Dunning)
>   12. Re: quality assessment and preprocessing for tiling
>       array-based CGH data (Sean Davis)
>   13. Problem using Rgraphviz (edge weights going missing). (Dan Bolser)
>   14. Re: newbie problems with AnnBuilder (Mark Kimpel)
>   15. Re: newbie problems with AnnBuilder (Sean Davis)
>   16. Re: newbie problems with AnnBuilder (Mark Kimpel)
>   17. Re: GOstats and org.EcK12.eg.db (Robert Gentleman)
>   18. Re: quality assessment and preprocessing for tiling
>       array-based CGH data (Leon Yee)
>   19. Bioconductor installation problem: unable to access
>       repository (Shinichiro Wachi)
>   20. Re: quality assessment and preprocessing for tiling
>       array-based CGH data (Sean Davis)
>   21. Re: GOstat: listing genes from hyperGTest (James W. MacDonald)
>   22. Re: Bioconductor installation problem: unable to	access
>       repository (Patrick Aboyoun)
>   23. Bioconductor 2.3 is released (Patrick Aboyoun)
>   24. Re: How to save result from limma (Jenny Drnevich)
>   25. scale questions (Hui-Yi Chu)
>   26. Re: [Fwd: batch info for cellHTS] (Florian Hahne)
>   27. problem with Category package and custom annotationDbi
>       (Mark Kimpel)
>   28. Re: problem with Category package and custom annotationDbi
>       (Marc Carlson)
>   29. Re: scale questions (Sean Davis)
>   30. Re: scale questions (Sean Davis)
>   31. Re: problem with Category package and custom annotationDbi
>       (Mark Kimpel)
>   32. Re: How to save result from limma (Gordon K Smyth)
>   33. Package "xps" "import.expr.scheme" error (Wei,Caimiao)
>   34. Re: Lumi and Beadstudio 1.5.13 (Leon Peshkin)
>   35. Offre exceptionnelle suite au probl?me technique
>       (Clara de Dessous Ch?ri)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Oct 2008 03:43:33 -0700 (PDT)
> From: Tim Smith <tim_smith_666 at yahoo.com>
> Subject: [BioC] GOstat: listing genes from hyperGTest
> To: bioc <bioconductor at stat.math.ethz.ch>
> Message-ID: <257981.79114.qm at web58005.mail.re3.yahoo.com>
> Content-Type: text/plain
>
>
> Hi,
>
> I
> was performing a hyperGTest for genes in homo-sapiens. For a set of
> input genes, this function returns some 'significant' GO terms. What I
> wanted to now do was to co-relate each significant GO term (returned by
> this function) with genes (from my set of input genes) associated with
> that GO term. However, I think that I may be using the wrong
> package/function to get the releveant set of genes.
>
> Currently, what I'm doing is finding the significant GO terms by using the following code:
>
> -----------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs 
>
>  paramsGO <- new("GOHyperGParams", geneIds = genes1,
>           universeGeneIds = allGenes, annotation = "org.Hs.eg.db", 
>           ontology = "BP", pvalueCutoff = 1, conditional = FALSE, 
>           testDirection = "over")
>
> GO <- hyperGTest(paramsGO)
> --------------------------
> This
> gives me a set of significant GO terms. Now, I would like to find which
> subset of genes in 'genes1' is associated with each of the significant
> GO term. To do this I map all GO terms to their Entrez IDs using the
> 'org.Hs.eg.db' package using the following:
>
> xx <- as.list(org.Hs.egGO2EG)
>
> to
> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
> this number small?) that map to at least one Entrez ID. So, from here I
> look up which Entrez IDs are associated with my GO term of interest.
>
> My
> problem is that often, the GO term from hyperGTest is not associated
> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
> above ), i.e. the GO term/ID is not in the list obtained from
> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
> hyperGTest, but does not appear to be associated with any Entrez IDs in
> the org.Hs.eg.db package. Where could I be going wrong?
>
> I would give a set of genes so that the example is reproducible, but [[elided Yahoo spam]]
>
> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
>
> My sessionInfo() is:
> --------------------------------
> R version 2.7.2 (2008-08-25) 
> i386-pc-mingw32 
>
> locale:
> LC_COLLATE=English_United
> States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
>  [1] grid      splines   tools     stats     graphics  grDevices utils     datasets  methods   base     
>
> other attached packages:
>  [1]
> gplots_2.6.0         gmodels_2.14.1       gtools_2.4.0        
> gdata_2.4.1          Rgraphviz_1.18.1     GOstats_2.6.0       
> Category_2.6.0      
>  [8] RBGL_1.16.0          annotate_1.18.0     
> xtable_1.5-2         graph_1.18.0         PFAM.db_2.2.0       
> GO.db_2.2.0          KEGG.db_2.2.0       
> [15] org.Hs.eg.db_2.2.0   AnnotationDbi_1.2.0  RSQLite_0.6-8        DBI_0.2-4            genefilter_1.20.0    survival_2.34-1      affy_1.18.0         
> [22] preprocessCore_1.2.0 affyio_1.8.0         Biobase_2.0.0       
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 MASS_7.2-44    
>
>
> ---------------------------------
>
>
>       
> 	[[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 22 Oct 2008 12:34:38 +0100
> From: "Pemmasani, Kalyani" <kalyani.pemmasani at nuigalway.ie>
> Subject: [BioC] export toptables into Genespring
> To: <bioconductor at stat.math.ethz.ch>
> Message-ID:
> 	<6B017AD2AE2F6F489087FC986588136B88FA42 at EVS1.ac.nuigalway.ie>
> Content-Type: text/plain;	charset="iso-8859-1"
>
>
> Hi all,
>
> Is there a way to export toptables from LIMMA into Genespring software program (from Agilent technologies) for clustering?
>
> Best regards,
> Kalyani
> -------------------------------------------
> Kalyani Pemmasani
> Marie Curie research fellow
> National Diagnostics Centre
> National University of Ireland
> Galway, IRELAND
> e.mail: kalyani.pemmasani at nuigalway.ie
> Ph.no: +353(0)91492815
> Fax: +353 (0) 91586570 
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 22 Oct 2008 09:07:16 -0400
> From: "James W. MacDonald" <jmacdon at med.umich.edu>
> Subject: Re: [BioC] Limma contrasts question
> To: Daniel Brewer <daniel.brewer at icr.ac.uk>
> Cc: bioconductor at stat.math.ethz.ch
> Message-ID: <48FF2584.5010509 at med.umich.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Daniel Brewer wrote:
>
>   
>> Hi Jim,
>>
>> Could you go into the maths of the contrast formulas a bit?  I would
>> like to get a really solid understanding of what it is doing for future
>> analyses.
>>     
>
> Once you understand what the coefficients are, the contrasts are just 
> simple algebra. In your case, all of the coefficients are estimating the 
> difference between the sample and PC3M (e.g., Knockdown - PC3M).
>
> So the algebra is something like this:
>
> 2(Knockdown - PC3M) - (Scramble - PC3M)
> =
> 2Knockdown - 2PC3M - Scramble + PC3M
> =
> 2Knockdown - Scramble - PC3M
> =
> Knockdown - (Scramble + PC3M)/2
>
> Which is knockdown minus the mean of the controls.
>
> Note that this will be the numerator of the resulting t-statistic. The 
> denominator will be sort of an average of the variability within each of 
> the three groups being compared. So the question being answered is 'What 
> genes are different in Knockdown as compared to the average of the 
> controls?'. However, there is nothing here to test if the two controls 
> are similar at all (and you might not care).
>
> So for instance, you might have a gene with average expression like this:
>
> Knockdown = 10
> PC3M = 4
> Scramble = 7
>
> If the intra-group variability is small for each sample type, then you 
> will likely get a significant t-statistic even though the two controls 
> are probably significantly different as well. Which is why I mentioned 
> earlier that you might want to test the Scramble - PC3M contrast as well.
>
> Best,
>
> Jim
>
>
>   
>> Many thanks
>>
>> Dan
>>
>>     
>
>



More information about the Bioconductor mailing list