[BioC] hypergeometric test in GOstats

Robert Gentleman rgentlem at fhcrc.org
Thu Mar 12 05:49:51 CET 2009

Hi Sebastien,
  It is expected that you do a little bit of the homework before posting.
Some things to try:

1) please read the posting guide, you need to give us information on your
particular version of R and Bioconductor, so that we can attempt to reproduce
the results you have.
2) you need to give a reproducible example, so we can test and make sure we are
getting the same answer as you do.  In this case you did not show us even the
call you used, so we have no way of knowing what was computed, and hence cannot
do more than speculate - which is a waste of everyone's time.
3) you need to read the documentation, there are manual pages and a vignette.
These are sometimes unclear, and/or incomplete, and letting us know what is not
clear helps us to improve them.
4) you should check the mailing list archive to see if the topic has already
been discussed (this one has come up very many times), and then you can readily
obtain your answer.

So, some speculation, I suspect that your test is conditional.
And if I understand your example, this is easily checked by a direct call to
fisher.test, which is different from the parts that you did report, again
suggesting that your testing is conditional.

> xm
     [,1] [,2]
[1,]   55  211
[2,]   69 1468
> fisher.test(xm)

        Fisher's Exact Test for Count Data

data:  xm
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 3.701153 8.260393
sample estimates:
odds ratio

best wishes

Sebastien Gerega wrote:
> Hi,
> I would like to get a better understanding of exactly how the
> calculations in GOstats are performed.
> Does the package use a standard hypergeometric test?
> If I have the following values:
> geneIds = 266
> universeGeneIds = 1803
> and a particular GO term has:
> size = 124
> count = 55
> then the associated p-value, odds ratio and expected count are
> 1.519928e-16, 5.660429, and 18.81197 respectively.
> However, I would have thought the expected count would be 266 * 124 /
> 1803 = 18.29. In addition, the p-value obtained from the hypergeometric
> test using the following website http://keisan.casio.com/has10/SpecExec.cgi
> is different.
> Are there any steps performed by the GOstats package that make it
> different from a standard hypergeo test? What is the reason for these
> differences?
> thanks in advance,
> Sebastien
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
rgentlem at fhcrc.org

More information about the Bioconductor mailing list