[BioC] hyperGTest, KEGG

Seth Falcon sfalcon at fhcrc.org
Sat Apr 14 03:31:29 CEST 2007


Hi Ivan,

ivan.borozan at utoronto.ca writes:
> I've used the script below to calculate over-represented KEGG  
> categories however I can not get to gene ID's associated with each of  
> the overrepresented KEGG terms/pathways ?

I've been working on making results easier to work with and also
improving the documentation.  This is all happening in the devel arm
(which will soon become the next release).

With a (very) recent version of Category you can get help on all
accessors for the result objects returned by hyperGTest:

     help("HyperGResult-accessors")

> My question, does catToGeneId() exist and how do I get to genes that  
> are associated with each of the above pathways ?

To get the category to universe of gene IDs mapping:

    > geneIdUniverse(ans)[1:2]
    $`00625`
     [1] "YCR105W" "YCR107W" "YDL243C" "YDR368W" "YFL056C" "YHR104W" "YJR155W"
     [8] "YKR009C" "YNL331C" "YOR120W"
    
    $`04010`
     [1] "YAL041W" "YBL016W" "YBL105C" "YBR083W" "YBR200W" "YCL027W" "YDL159W"
     [8] "YDL235C" "YDR103W" "YDR461W" "YDR480W" "YER111C" "YER118C" "YFL026W"
    [15] "YGL089C" "YGR032W" "YGR040W" "YGR088W" "YHL007C" "YHR005C" "YHR030C"
    [22] "YHR084W" "YIL147C" "YJL095W" "YJL128C" "YJL157C" "YJR086W" "YKL062W"
    [29] "YKL178C" "YKR095W" "YLR006C" "YLR113W" "YLR182W" "YLR229C" "YLR332W"
    [36] "YLR342W" "YLR362W" "YML004C" "YMR037C" "YMR043W" "YNL053W" "YNL098C"
    [43] "YNL145W" "YNL271C" "YNL283C" "YNR031C" "YOL105C" "YOR008C" "YOR212W"
    [50] "YOR231W" "YPL049C" "YPL089C" "YPL140C" "YPL187W" "YPR165W"

To get the category to _selected_ gene IDs mapping:

    > geneIdsByCategory(ans)[1:2]
    $`00625`
    [1] "YOR120W"
    
    $`04010`
    [1] "YFL026W" "YLR342W"

The number of selected genes in each category (just the length of each
element of the above):

    > geneCounts(ans)[1:2]
    00625 04010 
        1     2 

NOTE: I used the YEAST annotation data package as an example.  It is
non-typical in that it does not use Entrez Gene IDs as the base
identifier.  For your example, you will get Entrez IDs and you can map
those to SYMBOL if you want using the appropriate annotation data
package.

The above examples were done using:

R 2.5.0 beta, Category 2.1.36

sessionInfo()
R version 2.5.0 beta (--) 
powerpc-apple-darwin8.9.0 

locale:
C

attached base packages:
[1] "splines"   "tools"     "stats"     "graphics"  "grDevices" "datasets" 
[7] "utils"     "methods"   "base"     

other attached packages:
        YEAST      Category AnnotationDbi       RSQLite           DBI 
    "1.15.13"      "2.1.36"      "0.0.58"       "0.5-4"       "0.2-1" 
       Matrix       lattice    genefilter      survival      annotate 
  "0.9975-11"      "0.15-3"     "1.13.12"        "2.31"      "1.13.7" 
           GO          KEGG         graph       Biobase 
    "1.15.13"     "1.15.13"     "1.13.10"     "1.13.48" 

Hope that helps.

 + seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the Bioconductor mailing list