[BioC] Bug in hyperGTest for KEGGHyperGParams?

James F. Reid james.reid at ifom-ieo-campus.it
Tue Jul 5 19:05:52 CEST 2011


Hi Jenny,

I think you'll find the answer in this thread:
https://stat.ethz.ch/pipermail/bioconductor/2010-May/033439.html

Best,
J.

On 07/05/2011 06:13 PM, Jenny Drnevich wrote:
> Hi all,
>
> I'm doing both GO and KEGG over-representation testing on several
> different lists of genes, using the same background set for each list.
> What's got me puzzled is the difference in the "Gene universe size"
> reported from the hyperGTest results for each list from the KEGG test,
> even though they have the same background set. When I make a
> GOHyperGParams object for each list and test them, the results report
> the same "Gene universe size" for each list, which I assume to be the
> number of genes in the background that have any GO MF terms. However,
> for the KEGG test, each list reports a different "Gene universe size",
> so I'm unsure how selecting a different list from the same background
> can change the mapping of the background to KEGG terms. I haven't been
> able to get into the exact code of calling hyperGTest on a
> KEGGHyperGParams object, so I don't know what is going on - is it a bug?
> Or for KEGG terms, is this supposed to happen? Reproducible example and
> sessionInfo() below.
>
> Thanks,
> Jenny
>
>  > library(annaffy)
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'browseVignettes()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation("pkgname")'.
>
> Loading required package: GO.db
> Loading required package: AnnotationDbi
> Loading required package: DBI
>
> Loading required package: KEGG.db
>
>  > library(porcine.db)
> Loading required package: org.Ss.eg.db
>
>
>  > library(GOstats)
> Loading required package: Category
> Loading required package: graph
>  >
>  >
>  > all.ids <- Rkeys(porcineENTREZID)
>  > length(all.ids)
> [1] 30160
>  >
>  >
>  > set.seed(1234)
>  > list1 <- sample(all.ids,5000)
>  > list2 <- list1[1:1000]
>  > list3 <- list1[4501:5000]
>  >
>  > par.MF.list <- list(list1 = new("GOHyperGParams", geneIds = list1,
> universeGeneIds = all.ids,ontology="MF",
> + annotation="porcine.db", testDirection="over",
> pvalueCutoff=0.01,conditional=F),
> + list2 = new("GOHyperGParams", geneIds = list2, universeGeneIds =
> all.ids,ontology="MF",
> + annotation="porcine.db", testDirection="over",
> pvalueCutoff=0.01,conditional=F) ,
> + list3 = new("GOHyperGParams", geneIds = list3, universeGeneIds =
> all.ids,ontology="MF",
> + annotation="porcine.db", testDirection="over",
> pvalueCutoff=0.01,conditional=F))
>  >
>  > hg.MF.list <- lapply(par.MF.list,hyperGTest)
>  > hg.MF.list
> $list1
> Gene to GO MF test for over-representation
> 1007 GO MF ids tested (1 have p < 0.01)
> Selected gene set size: 569
> Gene universe size: 3198
> Annotation package: porcine
>
> $list2
> Gene to GO MF test for over-representation
> 419 GO MF ids tested (6 have p < 0.01)
> Selected gene set size: 106
> Gene universe size: 3198
> Annotation package: porcine
>
> $list3
> Gene to GO MF test for over-representation
> 266 GO MF ids tested (2 have p < 0.01)
> Selected gene set size: 63
> Gene universe size: 3198
> Annotation package: porcine
>
> #Note the Gene universe size is 3198 for all 3 lists
>
>  >
>  >
>  > par.KEGG <- list(list1 = new("KEGGHyperGParams", geneIds = list1,
> universeGeneIds = all.ids,
> + annotation="porcine.db", testDirection="over", pvalueCutoff=0.01),
> + list2= new("KEGGHyperGParams", geneIds = list2, universeGeneIds =
> all.ids,
> + annotation="porcine.db", testDirection="over", pvalueCutoff=0.01) ,
> + list3= new("KEGGHyperGParams", geneIds = list3, universeGeneIds =
> all.ids,
> + annotation="porcine.db", testDirection="over", pvalueCutoff=0.01) )
>  >
>  > hg.KEGG <- lapply(par.KEGG,hyperGTest)
>  > hg.KEGG
> $list1
> Gene to KEGG test for over-representation
> 190 KEGG ids tested (3 have p < 0.01)
> Selected gene set size: 280
> Gene universe size: 1629
> Annotation package: porcine
>
> $list2
> Gene to KEGG test for over-representation
> 105 KEGG ids tested (1 have p < 0.01)
> Selected gene set size: 54
> Gene universe size: 1363
> Annotation package: porcine
>
> $list3
> Gene to KEGG test for over-representation
> 87 KEGG ids tested (1 have p < 0.01)
> Selected gene set size: 30
> Gene universe size: 1204
> Annotation package: porcine
>
> # Now there are 3 different Gene universe sizes: 1629, 1363 and 1204. WHY?
>
>  >
>  >
>  > sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GOstats_2.18.0 graph_1.30.0 Category_2.18.0 porcine.db_2.4.7
> org.Ss.eg.db_2.5.0 annaffy_1.24.0
> [7] KEGG.db_2.5.0 GO.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5
> AnnotationDbi_1.14.1 Biobase_2.12.1
>
> loaded via a namespace (and not attached):
> [1] annotate_1.30.0 genefilter_1.34.0 GSEABase_1.14.0 RBGL_1.28.0
> splines_2.13.0 survival_2.36-5 tools_2.13.0
> [8] XML_3.4-0.2 xtable_1.5-6
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list