[BioC] GOHyperG for KEGG

Shi, Tao shidaxia at yahoo.com
Wed Sep 14 19:32:58 CEST 2005


Hi Dick,

Glad to help!

I have a newer version of the function, since the old one uses probe set names as unique
identifiers.  Dr. Gentleman pointed out to me that this is not appropriate since you end up double
counting some.  The function below takes care of that.  Again, it was modified from GOHyperG and
I've added a new "KEGG" option.   The input 'x' is a set of Locus Link ID numbers.  

Please let me know if any problems.  Thanks,

...Tao


##=================================================================
GOKEGGHyperG <- function (x, lib = "hgu95av2", what = "MF") 
{
    getDataEnv <- function(name, lib) {
        get(paste(lib, name, sep = ""), mode = "environment")
    }
    require(lib, character.only = TRUE) || stop("need data package", lib)
    if (any(duplicated(x))) 
        stop("input IDs must be unique")
        
    ################  MODIFIED #################
    match.arg(what, c("MF", "BP", "CC", "KEGG"))
    ############################################
    
    cLLs <- unlist(as.list(getDataEnv("LOCUSID", lib)))
    ourLLs <- cLLs[match(x, cLLs)]
    
    ################## MODIFIED ###########################
    if (what=="KEGG") {
        goV <- as.list(getDataEnv("PATH2PROBE", lib))
    } else {
        goV <- as.list(getDataEnv("GO2ALLPROBES", lib))
    }
    ######################################################
    
    whWeHave <- sapply(goV, function(y) {
        if (is.na(y) || length(y) == 0) 
            return(FALSE)
        lls = unique(unlist(mget(y, getDataEnv("LOCUSID", lib), 
            ifnotfound = NA)))
        any(x %in% lls)
    })
    goV <- goV[whWeHave]
    
    ##################  MODIFIED #########################
    if (what!="KEGG") {
        goCat <- unlist(getGOOntology(names(goV)))
        goodGO <- goCat == what
        mm <- match(names(goCat), names(goV))
        mm <- mm[goodGO]
        goV <- goV[mm]
    }
    ######################################################
    
    
    goVcts = sapply(goV, function(x) {
        if (length(x) == 0 || is.na(x)) 
            return(NA)
        lls <- unique(unlist(mget(x, getDataEnv("LOCUSID", lib))))
        lls <- lls[!is.na(lls)]
        lls
    })
    bad <- sapply(goVcts, function(x) (length(x) == 1 && is.na(x)))
    goVcts = goVcts[!bad]
    goV = goV[!bad]
    cLLs <- unique(unlist(goVcts))
    nLL <- length(cLLs)
    goCounts <- sapply(goVcts, length)
    ourLLs <- unique(ourLLs[!is.na(ourLLs)])
    ours <- ourLLs[!duplicated(ourLLs)]
    whGood <- ours[ours %in% cLLs]
    nInt = length(whGood)
    if (nInt == 0) 
        warning("no interesting genes found")
    useCts <- sapply(goVcts, function(x) sum(whGood %in% x))
    pvs <- phyper(useCts - 1, nInt, nLL - nInt, goCounts, lower.tail = FALSE)
    ord <- order(pvs)
    
    ################################  MODIFIED ##################################################
    if (what!="KEGG") {
        return(list(pvalues = pvs[ord], goCounts = goCounts[ord], chip = lib, go2Affy = goV, 
        intCounts = useCts[ord], numLL = nLL, numInt = nInt, intLLs = x))
    } else {
        return(list(pvalues = pvs[ord], keggCounts = goCounts[ord], chip = lib, kegg2Affy = goV, 
        intCounts = useCts[ord], numLL = nLL, numInt = nInt, intLLs = x))
    }
    ################################  MODIFIED ##################################################
    
}
#=====================================================================


--- Dick Beyer <dbeyer at u.washington.edu> wrote:

> Hi Tao,
> 
> I tried out your KEGGHyperG function.  Seems to work just great.  Thanks very much,
> Dick
> 
> *******************************************************************************
> Richard P. Beyer, Ph.D.	University of Washington
> Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
> Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
>  			Seattle, WA 98105-6099
> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
> http://staff.washington.edu/~dbeyer
> *******************************************************************************
> 
> --- "Shi, Tao" <shidaxia at yahoo.com> wrote:
> 
> Message: 8
> Date: Tue, 6 Sep 2005 10:27:05 -0700 (PDT)
> From: "Shi, Tao" <shidaxia at yahoo.com>
> Subject: Re: [BioC] GOHyperG for KEGG
> To: Gunnar Wrobel <bioc at gunnarwrobel.de>
> Cc: bioconductor at stat.math.ethz.ch
> Message-ID: <20050906172705.66173.qmail at web52703.mail.yahoo.com>
> Content-Type: text/plain; charset=iso-8859-1
> 
> Thank you very much, Gunnar.  I'll try that.
> 
> At same time, I wrote a function by myself, which I totally stole from GOHyperG.
> Just want to
> share it with everybody.  Please let me know if there are any bugs!
> 
> ...Tao
> 
> =============================================================================
> 
> KEGGHyperG <-
> function (geneIDs, lib = "hgu95av2") {
>      getDataEnv <- function(name, lib) {
>          get(paste(lib, name, sep = ""), mode = "environment")
>      }
>      require(lib, character.only = TRUE) || stop("need data package", lib)
>      if (any(duplicated(geneIDs)))  stop("input IDs must be unique")
>      keggV <- as.list(getDataEnv("PATH2PROBE", lib))
> 
>      whWeHave <- sapply(keggV, function(y) {
>          if (is.na(y) || length(y) == 0)
>              return(FALSE)
>          ids = unique(unlist(y))
>          any(geneIDs %in% ids)
>      })
> 
>      keggV <- keggV[whWeHave]
>      keggV <- sapply(keggV, function(x) {
>          if(any(grep("AFFX",x))) {
>              return(x[-grep("AFFX",x)])
>          } else {
>              return(x)
>          }
>      } ) ## get rid of control probes
> 
>      bad <- sapply(keggV, function(x) (length(x) == 1 && is.na(x)))
>      keggV <- keggV[!bad]
>      cIDs <- unique(unlist(keggV))
>      nIDs <- length(cIDs)
>      keggCounts <- sapply(keggV, length)
>      ourIDs <- unique(geneIDs[!is.na(geneIDs)])
>      ours <- ourIDs[!duplicated(ourIDs)]
>      whGood <- ours[ours %in% cIDs]
> 
>      nInt = length(whGood)
>      if (nInt == 0)  { warning("no interesting genes found") }
>      useCts <- sapply(keggV, function(x) sum(whGood %in% x))
> 
>      pvs <- phyper(useCts - 1, nInt, nIDs - nInt, keggCounts, lower.tail = FALSE)
>      ord <- order(pvs)
>      return(list(pvalues = pvs[ord], keggCounts = keggCounts[ord],
>          chip = lib, kegg2Affy = keggV, intCounts = useCts[ord], numIDs = nIDs,
>          numInt = nInt, intIDs = geneIDs))
> }
> ================================================================================
> ==
> 
> 
> 
> 
> --- Gunnar Wrobel <bioc at gunnarwrobel.de> wrote:
> 
> > > Is there a similar function like GOHyperG that works on KEGG?  It seems
> there is no such thing
> > > back in Feb. 05
> (https://stat.ethz.ch/pipermail/bioconductor/2005-February/007532.html).  Any
> > > updates?
> > Hi Tao,
> > 
> > you might try to do this with goCluster. It does the same kind of
> > calculation as GOHyperG but can use any kind of annotation.
> > 
> > Cheers
> > 
> > Gunnar
> >
> 
> 
> 
> 
> 
> 
>



More information about the Bioconductor mailing list