[R] A question about the hypergeometric distribution and phyper()

Wed Sep 10 15:19:38 CEST 2008

Dear All

I have a question about the hypergeomteric distribution.

Example 1: I have a universe of 6187 objects, and 164 have a particular
attribute, therefore 6187-164 do not have that attribute.  I sample 249
of those objects, and find that 19 have that attribute.  I get a p-value
here (looking at just over-representation):

phyper(19, 164, 6187-164, 249, lower.tail=FALSE)
[1] 7.816235e-06

Example 2: I have a universe of 6187 objects, and 12 have a particular
attribute, therefore 6187-12 do not have that attribute.  I sample 249
of those objects, and find that 4 have that attribute.  I get a p-value
here (looking at just over-representation):

phyper(4, 12, 6187-12, 249, lower.tail=FALSE)
[1] 6.368919e-05

It seems to me that the probability of seeing 19 out of 164 in a sample
of 249 being less than the probability of seeing 4 out of 12 in a sample
of the same size is counter-intuitive.

First off, am I using phyper() properly?
Secondly, can someone point me to some documentation explaining why
these seemingly counter-intuitive p-values occur?

Thanks
Mick