[R] Multivariate hypergeometric distribution version of phyper()

Karl Brand k.brand at erasmusmc.nl
Tue Mar 30 16:02:01 CEST 2010


Dear R Users,

I employed the phyper() function to estimate the likelihood that the 
number of genes overlapping between 2 different lists of genes is due to 
chance. This appears to work appropriately.

Now i want to try this with 3 lists of genes which phyper() does not 
appear to support.

Some googling suggests i can utilize the Multivariate hypergeometric 
distribution to achieve this. eg.:

http://en.wikipedia.org/wiki/Hypergeometric_distribution

But when i try to do this manually using the choose() function (see 
attempt below example with just two gene lists) i'm unable to perform 
the calculations- the numbers hit infinity before getting an answer.

Searching cran archives for "Multivariate hypergeometric" show this term 
in the vignettes of package's ‘combinat’ and ‘forward’. But i'm unable 
to make sense of the these pachakege functions in the context of my 
aforementioned apllication.

Can some one suggest a function, script or method to achieve my goal of 
estimating the likelyhood of overlap between 3 lists of genes, ideally 
using the multivariate hypergeometric, or anything else for that matter?

cheers in advance,

Karl



#example attempt with two gene lists m & n
N <- 45101 # total number balls in urn
m <- 720   # number of 'white' or 'special' balls in urn, aka 'success'
n <- 801   # number balls drawn or number of samples
k <- 40    # number of 'white' or 'special' balls DRAWN

a <- choose(m,k)
b <- choose((N-m),(n-k))
z <- choose(N,n)
prK <- (a*b)/z #'the answer'
print(prK)
[1] NaN

 > a
[1] 7.985852e+65
 > b
[1] Inf
 > z
[1] Inf


-- 
Karl Brand
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 704 3457 | F +31 (0)10 704 4743 | M +31 (0)642 777 268



More information about the R-help mailing list