[BioC] GOHyperG for KEGG

Dick Beyer dbeyer at u.washington.edu
Wed Sep 14 19:09:00 CEST 2005


Hi Tao,

I tried out your KEGGHyperG function.  Seems to work just great.  Thanks very much,
Dick

*******************************************************************************
Richard P. Beyer, Ph.D.	University of Washington
Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
 			Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer
*******************************************************************************

--- "Shi, Tao" <shidaxia at yahoo.com> wrote:

Message: 8
Date: Tue, 6 Sep 2005 10:27:05 -0700 (PDT)
From: "Shi, Tao" <shidaxia at yahoo.com>
Subject: Re: [BioC] GOHyperG for KEGG
To: Gunnar Wrobel <bioc at gunnarwrobel.de>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <20050906172705.66173.qmail at web52703.mail.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1

Thank you very much, Gunnar.  I'll try that.

At same time, I wrote a function by myself, which I totally stole from GOHyperG.
Just want to
share it with everybody.  Please let me know if there are any bugs!

...Tao

=============================================================================

KEGGHyperG <-
function (geneIDs, lib = "hgu95av2") {
     getDataEnv <- function(name, lib) {
         get(paste(lib, name, sep = ""), mode = "environment")
     }
     require(lib, character.only = TRUE) || stop("need data package", lib)
     if (any(duplicated(geneIDs)))  stop("input IDs must be unique")
     keggV <- as.list(getDataEnv("PATH2PROBE", lib))

     whWeHave <- sapply(keggV, function(y) {
         if (is.na(y) || length(y) == 0)
             return(FALSE)
         ids = unique(unlist(y))
         any(geneIDs %in% ids)
     })

     keggV <- keggV[whWeHave]
     keggV <- sapply(keggV, function(x) {
         if(any(grep("AFFX",x))) {
             return(x[-grep("AFFX",x)])
         } else {
             return(x)
         }
     } ) ## get rid of control probes

     bad <- sapply(keggV, function(x) (length(x) == 1 && is.na(x)))
     keggV <- keggV[!bad]
     cIDs <- unique(unlist(keggV))
     nIDs <- length(cIDs)
     keggCounts <- sapply(keggV, length)
     ourIDs <- unique(geneIDs[!is.na(geneIDs)])
     ours <- ourIDs[!duplicated(ourIDs)]
     whGood <- ours[ours %in% cIDs]

     nInt = length(whGood)
     if (nInt == 0)  { warning("no interesting genes found") }
     useCts <- sapply(keggV, function(x) sum(whGood %in% x))

     pvs <- phyper(useCts - 1, nInt, nIDs - nInt, keggCounts, lower.tail = FALSE)
     ord <- order(pvs)
     return(list(pvalues = pvs[ord], keggCounts = keggCounts[ord],
         chip = lib, kegg2Affy = keggV, intCounts = useCts[ord], numIDs = nIDs,
         numInt = nInt, intIDs = geneIDs))
}
================================================================================
==




--- Gunnar Wrobel <bioc at gunnarwrobel.de> wrote:

> > Is there a similar function like GOHyperG that works on KEGG?  It seems
there is no such thing
> > back in Feb. 05
(https://stat.ethz.ch/pipermail/bioconductor/2005-February/007532.html).  Any
> > updates?
> Hi Tao,
> 
> you might try to do this with goCluster. It does the same kind of
> calculation as GOHyperG but can use any kind of annotation.
> 
> Cheers
> 
> Gunnar
>



More information about the Bioconductor mailing list