[Rd] dhyper, phyper (PR#10853)

Wed Feb 27 09:05:07 CET 2008

Aloha all,

I know too little about what I'm about to write and hope I'm not  
wasting your time.

For a class I'm teaching in archaeological data analysis, I'm trying  
to put together a routine that calculates the so-called Petersen  
index and, especially, confidence intervals for the index.  This was  
introduced to archaeologists by N.R.J. Fieller and A. Turner in an  
article in Journal of Archaeological Science (1982) called Number  
Estimation in Vertebrate Samples.  They say that "calculation of  
precise confidence intervals for population sizes is, in principle at  
least, straightforward.  It involves calculation of cumulative  
hypergeometric probabilities (i.e. the summation of probabilities  
given by equation 3.1 of Seber, 1973)."  The reference is to G.A.F.  
Seber's book, The Estimation of Animal Abundance.

I went to equation 3.1 and wrote a small function to sum its  
probabilities, modeled after phyper() and taking the arguments in the  
same order (the names have changed to suit the archaeological  
situation):

> seber <- function(p,l,n,r)
>   {
>     y <- 0
>     for (x in 0:p)
>       y <- y + exp(lchoose(l,x) + lchoose(n-l,r-x) - lchoose(n,r))
>     y
>   }

When used in the larger routine, this yields results that very  
closely approximate the results in Fieller and Turner's table 1.

I initially thought I could use the function phyper() for this  
because, as I interpret the help files, this routine yields  
cumulative hypergeometric probabilities.  But I'm finding that it  
gives different results than seber().

I apologize if I am in too far over my head, but I am wondering if  
this is a bug in dhyper/phyper?  Perhaps I have misunderstood what  
phyper() actually does, or am calling it incorrectly?  Or, were  
Fieller and Turner in error?

All the best,
Tom

Thomas Dye
Dean Hall 201, Tuesday 1:00-1:55 pm