[BioC] new topGO results using GO.db very different from old ones using GO

Joern Toedling toedling at ebi.ac.uk
Fri May 2 16:48:20 CEST 2008


Dear all,
I would appreciate any suggestion on the following issue. I have noticed 
a major inconsistency between new and older topGO results. For the older 
ones, topGO used the "GO" package, while it uses "GO.db" for the new 
results I can't figure out whether it is a problem with topGO only or 
whether there are some serious inconsistencies between GO and GO.db

Here is the source code I used:

library("topGO")

## load list of genes of interest

load("brainOnlyGenes.RData")

## load genereal gene-to-GO mapping and universe of genes to use in analysis:

load("mm9gene2GO.RData")

load("arrayGenesWithGO.RData")

## then the function to call topGO and to return a nice result table:

sigGOTable <- function(selGenes, GOgenes=arrayGenesWithGO, 

 gene2GO=mm9.gene2GO[arrayGenesWithGO], ontology="BP", maxP=0.001)

{

  inGenes <- factor(as.integer(GOgenes %in% selGenes))

  names(inGenes) <- GOgenes

  GOdata <- new("topGOdata", ontology=ontology, allGenes=inGenes, 

                annot=annFUN.gene2GO, gene2GO=gene2GO)

  myTestStat <- new("elimCount", testStatistic=GOFisherTest, 

                    name="Fisher test", cutOff=maxP)

  mySigGroups <- getSigGroups(GOdata, myTestStat)

  sTab <- GenTable(GOdata, mySigGroups, topNodes=length(usedGO(GOdata)))

  names(sTab)[length(sTab)] <- "p.value"

  return(subset(sTab, as.numeric(p.value) < maxP))

}#

## call it:

(brainRes <- sigGOTable(brainOnlyGenes))

# with  topGO_1.4.0  using GO_2.0.1

# this is:

#         GO.ID                  Term Annotated Significant Expected p.value
#  1 GO:0007268 synaptic transmission       136          44    24.46 3.0e-05
#  2 GO:0007610              behavior       180          54    32.38 4.4e-05
#  3 GO:0007409          axonogenesis       119          38    21.41 0.00014
#  4 GO:0006887            exocytosis        40          17     7.20 0.00026
#  5 GO:0007420     brain development       136          40    24.46 0.00066


# which kind of make sense if it somehow to annotate a list of interesting genes when investigating brain cells

## now unfortunately using all the same gene list, universe and gene-to-GO mapping, and the same function as above

##  with topGO_1.9.0  using GO.db_2.2.0, the result is:

#        GO.ID                                Term Annotated Significant Expected  p.value
# 1 GO:0007268    mitochondrial genome maintenance       137          44    24.65  3.7e-05
# 2 GO:0007610                        reproduction       180          54    32.39  4.4e-05
# 3 GO:0007409          single strand break repair       119          38    21.41  0.00014
# 4 GO:0006887     regulation of DNA recombination        40          17     7.20  0.00026
# 5 GO:0007420 regulation of mitotic recombination       136          40    24.47  0.00066


# which is obviously very, very different


Does anyone have an educated guess what is going on? Could it be a bug a 
in topGO? Or is the information in GO.db really different from the one 
in GO, and in that case which one is the right one?

Best regards,
Joern



More information about the Bioconductor mailing list