[R] Goodness of fit to Poisson / NegBinomial

Mark Myatt mark at myatt.demon.co.uk
Thu Feb 8 11:13:09 CET 2001


I (Mark Myatt) wrote:

> > I have some data on parasites on apple leaves and want to do a
> > goodness of fit test to a Poisson distribution. This seems to
> > do it:
> > 
> >   mites <- c(rep(0,70),
> >              rep(1,38),
> >              rep(2,17), 
> >              rep(3,10), 
> >              rep(4,9), 
> >              rep(5,3), 
> >              rep(6,2), 
> >              rep(7,1))
> > 
> >   tab <- table(mites)
> >   NSU <- length(mites)
> >   N <- sum(mites)
> >   NSU.Mean <- N / NSU
> >   exp <- dpois(as.numeric(dimnames(tab)[[1]]), NSU.Mean) *  NSU
> >   chi2 <- sum(((as.vector(tab) - exp)^2) / exp)
> > 
> >   tab
> >   exp
> >   chi2
> >   
> > I have some small expected values and so need to collapse some
> > categories. If anyone has code to do that already ... but my
> > question is how would I do the same for a negative binomial
> > distribution?

Jim Lindsey wrote:

>> The negative binomial density function is available in my rmutil
>> library. (www.luc.ac.be/~jlindsey/rcode.html)

Brian Ripley wrote:

>It is in R too, dnbinom.

That was silly of me ... It is even there on the dpois() help page that
I had already visited. I have a further question ... I am not familiar
with the parameters required for dnbinom:

        dnbinom(x, size, prob, log = FALSE)

I am used to 'mu' and 'k'. The parameter 'prob' is I think the failure
probability:

        tab[[1]] / NSU

Is that right? I am unsure of how to specify the 'size' parameter in
this context.

I have a proto-function working without dnbinom():

neg.bin.gof <- function(data)
  {
  
  case.count <- vector(mode = "numeric", length = max(data) + 2)
  observed <- vector(mode = "numeric", length = max(data) + 2)
  neg.bin.expected <- vector(mode = "numeric", length = max(data) + 2)
  neg.bin.partial.chi.square <- vector(mode = "numeric", length = 
    max(data) + 2)
  
  #
  # calculate frequencies
  #
  for (i in 0:(max(data) + 1))
    {
    case.count[i + 1] <- i
    observed[i + 1] <- sum(data == i)
    }  
 
  #
  # calculate summary measures
  #
  N.SU <- length(data)
  cases <- sum(data)
  cases.per.SU.mean <- cases / N.SU
  cases.per.SU.var <- var(data)
  
  #
  # find negative parameter 'k' by successive approximation
  #
  delta.direction <- 0
  delta.difference <- 1
  k <- cases.per.SU.mean ^ 2 / (cases.per.SU.var - cases.per.SU.mean)
  repeat  
    { 
    difference <- log10(N.SU / observed[1]) - k * log10(1 + 
      (cases.per.SU.mean / k))
    if(abs(difference) < 0.0000001) break
    if (delta.direction != sign(difference)) delta.difference <- 
      delta.difference / 10
    delta.direction <- sign(difference)
    k <- k + delta.direction * delta.difference
    }  

  #
  # calculate expected number of sampling units with ZERO cases
  #
  neg.bin.expected[1]  <- (1 + (cases.per.SU.mean / k)) ^ (0 - k) * N.SU
  neg.bin.sum.expected <- neg.bin.expected[1]
  
  #
  # calculate expected number of sampling units with 1, 2, etc. cases 
  #
  for (i in 2:(max(data) + 1))
    {
    neg.bin.expected[i] <- (cases.per.SU.mean / (cases.per.SU.mean + k)) 
      * ((k + i - 1 - 1) / (i - 1)) * neg.bin.expected[i - 1]
    neg.bin.sum.expected <- neg.bin.sum.expected + neg.bin.expected[i]
    } 
  neg.bin.expected[max(data) + 2] <- N.SU - neg.bin.sum.expected  

  #
  # calculate chi-square statistic
  #
  neg.bin.partial.chi.square <- (observed - neg.bin.expected) ^ 2 / 
    neg.bin.expected
  neg.bin.chi.square.statistic <- sum(neg.bin.partial.chi.square)  

  #
  # some output ... punt out as a list when function debugged
  #
  output.table <- cbind(case.count, observed, neg.bin.expected,
    neg.bin.partial.chi.square)
  print(output.table)  
  print(neg.bin.chi.square.statistic)
  print(k)
}

But I really should plug a gap in my ignorance.

Mark


--
Mark Myatt


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list