[R] Modelling an "incomplete Poisson" distribution ?

Sun Apr 19 01:37:56 CEST 2009

Emmanuel Charpentier <charpent <at> bacbuc.dyndns.org> writes:

> 
> I forgot to add that yes, I've done my homework, and that it seems to me
> that answers pointing to zero-inflated Poisson (and negative binomial)
> are irrelevant ; I do not have a mixture of distributions but only part
> of one distribution, or, if you'll have it, a "zero-deflated Poisson".
> 
> An answer by Brian Ripley
> (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar
> question leaves me a bit dismayed : if it is easy to compute the
> probability function of this zero-deflated RV (off the top of my head,
> Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm
> (still) able to use optim to ML-estimate lambda, using this to
> (correctly) model my problem set and test it amounts to re-writing some
> (large) part of glm. Furthermore, I'd be a bit embarrassed to test it
> cleanly (i. e. justifiably) : out of the top of my head, only the
> likelihood ration test seems readily applicable to my problem. Testing
> contrasts in my covariates ... hum !
> 
> So if someone has written something to that effect, I'd be awfully glad
> to use it. A not-so-cursory look at the existing packages did not ring
> any bells to my (admittedly untrained) ears...
> 
> Of course, I could also bootstrap the damn thing and study the
> distribution of my contrasts. I'd still been hard pressed to formally
> test hypotheses on factors...
> 

  I would call this a truncated Poisson distribution, related
to hurdle models.  You could probably use the hurdle function
in the pscl package to do this, by ignoring the fitting to
the zero part of the model.  On the other hand, it might break
if there are no zeros at all (adding some zeros would be a
pretty awful hack/workaround).

  If you defined a dtpoisson() for the distribution of the
truncated Poisson model, you could probably also use bbmle
with the formula interface and the "parameters" argument.

  The likelihood ratio test seems absolutely appropriate for
this case.  Why not?

  Ben Bolker