[BioC] GOstats - internal filtering?

James W. MacDonald jmacdon at uw.edu
Mon Apr 23 16:42:31 CEST 2012


Hi Andrew,

On 4/20/2012 10:50 PM, Andrew Jaffe wrote:
> Hopefully I can get a quick answer to this question about GOstats.
>
> I'm trying to calculate enrichment for every GO category using the GOstats
> package. I would assume that setting the p-value cutoff = 1 with
> conditional=FALSE would give me an enrichment odds ratio/p-value for every
> GO category in, say, the BP ontology. However, this does not seem to be the
> case, as the number of categories returned seems to be a function of the
> geneIds supplied:
>
>> params = new("GOHyperGParams", geneIds = y$ENTREZID[y$p<  0.001],
> + universeGeneIds = y$ENTREZID,
> + annotation = "hgu133plus2.db",
> + ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
> + testDirection="over")
>> ht=hyperGTest(params)
>> nrow(summary(ht))
> [1] 6080
>
>> params2 = new("GOHyperGParams", geneIds = y$ENTREZID[y$p<  0.01],
>          universeGeneIds = y$ENTREZID,
> + universeGeneIds = y$ENTREZID,
> + annotation = "hgu133plus2.db",
> + ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
> + testDirection="over")
>> ht2=hyperGTest(params2)
>> nrow(summary(ht2))
> [1] 7856
>
> Does the HyperGTest function drop GO categories without any genes in them
> prior to returning the results table? Or is something else going on?

Technically, yes. The only GO terms that are tested are those that arise 
from mapping your Entrez Gene IDs to GO terms.

Best,

Jim


>
> Thanks,
> Andrew
>
>> sessionInfo()
> R version 2.15.0 Patched (2012-04-20 r59123)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
>   [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
>   [5] LC_MONETARY=en_US.iso885915    LC_MESSAGES=en_US.iso885915
>   [7] LC_PAPER=C                     LC_NAME=C
>   [9] LC_ADDRESS=C                   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
>   [1] GO.db_2.7.1          sva_3.2.0            mgcv_1.7-13
>   [4] corpcor_1.6.2        hgu133plus2.db_2.7.1 genefilter_1.38.0
>   [7] RColorBrewer_1.0-5   GOstats_2.22.0       Category_2.22.0
> [10] org.Hs.eg.db_2.7.1   RSQLite_0.11.1       DBI_0.2-5
> [13] funxBox_0.1          digest_0.5.2         multtest_2.12.0
> [16] GSEABase_1.18.0      graph_1.34.0         annotate_1.34.0
> [19] AnnotationDbi_1.18.0 limma_3.12.0         Biobase_2.16.0
> [22] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
>   [1] grid_2.15.0      IRanges_1.14.2   lattice_0.20-6   MASS_7.3-17
>   [5] Matrix_1.0-6     nlme_3.1-103     RBGL_1.32.0      splines_2.15.0
>   [9] stats4_2.15.0    survival_2.36-12 tools_2.15.0     XML_3.9-4
> [13] xtable_1.7-0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list