[BioC] GOstats::geneIdUniverse() and its relation to organism annotation db's

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Sep 6 11:21:59 CEST 2010


I've been trying to do some non-run-of-the-mill gene ontology analysis
and have been messing around with the innards of the results we get
from GOstats::hyperGTest.

As a result, I've found some peculiarities in the GO annotations I'm
retrieving from the hyperGTest and how they relate to the GO
annotations in my organism's (saccharomyces cerevisiae) annotation

In particular: for some enriched GO term X, I'm finding genes listed
in the geneIdUniverse of X that do not have X as a member of their
organism.db.GO map, and I'm not sure how that could be. I'm even
setting condition=FALSE to my hyperGTest to make it as "vanilla" as
possible (but as far as I understand the conditional test, this
wouldn't effect what I'm seeing anyway since genes are removed from GO
terms, and not added).

I have a specific example below. You'll find that for the first
significant GOID found (GO:0007059), gene "YKL049C" is listed as a
member of this GOID's geneIdUniverse as it is returned from the
hyperGTest, but GO:0007059 isn't listed in my annotation db GO mapping
(org.Sc.sgdGO) for this gene, nor is "YKL049C" found in its GO2ORF

=== Example ===


orfs <- c(
  "YGR084C", "YDR409W",   "YFR017C",   "YDR342C",   "YBR152W",
  "YIL014W", "YGR134W",   "YDL109C",   "YDL234C",   "YDR524C",
  "YBR002C", "YGR063C",   "YDR034C-A", "YIR011C",
  "YHR175W", "YHR109W",   "YGR207C",   "YCR023C",   "YER039C-A",
  "YGL064C", "YGR249W",   "YHR172W",   "YGL226W",   "YAL064W",
  "YDR309C", "YDR473C",   "YAL058W",   "YHL030W",   "YKL101W",
  "YJL122W", "YBR121C",   "YIL018W",   "YDR443C",   "YBR069C",
  "YBR011C", "YHR009C",   "YGL166W",   "YIL103W",   "YDR206W",
  "YBL031W", "YBR162W-A", "YFL018C",   "YBR214W",   "YIL076W",
  "YBR291C", "YKL049C",   "YIL149C",   "YDR034W-B", "YDR395W")

orf2go <- getAnnMap('GO', 'org.Sc.sgd')
go2orf <- getAnnMap('GO2ORF', 'org.Sc.sgd')
all.orfs <- keys(orf2go)

## Do GOstats test to get significant GO ontologies
params <- new("GOHyperGParams", geneIds=orfs,
              universeGeneIds=all.orfs, annotation='org.Sc.sgd',
              ontology='BP', pvalueCutoff=0.05,
              conditional=FALSE, testDirection='over')
GO <- hyperGTest(params)

## Look at the genes annotated in the universe for each GOID and compare
## with annotations for each gene in org.Sc.sgdGO
universes <- geneIdUniverse(GO)
uid <- names(universes)[1] ## GO:0007059
uid.members <- universes[[uid]]
"YKL049C" %in% uid.members ## TRUE

## Is the gene listed in GOID's (uid) universe annotated with that GOID?
## Given previous result, this *should* be TRUE
go.ids <- orf2go[["YKL049C"]]
uid %in% names(go.ids) ## FALSE

## Does that GOID have this gene as its member?
## This also should be TRUE
check.orfs <- go2orf[[uid]]
"YKL049C" %in% check.orfs ## FALSE


Since `"YKL049C" %in% uid.members` is TRUE, I would expect the go term
`uid` (GO:0007059) to be in both `names(go.ids)` and `check.orfs`, but
it's not.

Can someone shed some light on this for me?

R version 2.11.1 (2010-05-31)

[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] GO.db_2.4.1          annotate_1.26.1      org.Sc.sgd.db_2.4.1
 [5] RSQLite_0.9-2        DBI_0.2-5            graph_1.26.0
 [9] AnnotationDbi_1.10.2 Biobase_2.8.0        devtools_0.1
[13] roxygen_0.1-2        profr_0.1.1          digest_0.4.2

loaded via a namespace (and not attached):
 [1] GSEABase_1.10.0   RBGL_1.24.0       XML_3.1-1
evaluate_0.3      genefilter_1.30.0
 [6] plyr_1.1          splines_2.11.1    survival_2.35-8
tools_2.11.1      xtable_1.5-6


Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

More information about the Bioconductor mailing list