[R] Poisson Regression: questions about tests of assumptions

Sun Oct 14 18:13:07 CEST 2012

On Sun, 14 Oct 2012, Eiko Fried wrote:

> I would like to test in R what regression fits my data best. My dependent
> variable is a count, and has a lot of zeros.
>
> And I would need some help to determine what model and family to use
> (poisson or quasipoisson, or zero-inflated poisson regression), and how to
> test the assumptions.
>
> 1) Poisson Regression: as far as I understand, the strong assumption is
> that dependent variable mean = variance. How do you test this? How close
> together do they have to be? Are unconditional or conditional mean and
> variance used for this? What do I do if this assumption does not hold?

There are various formal tests for this, e.g., dispersiontest() in package 
"AER". Alternatively, you can use a simple likelihood-ratio test (e.g., by 
means of lrtest() in "lmtest") between a poisson and negative binomial 
(NB) fit. The p-value can even be halved because the Poisson is on the 
border of the NB theta parameter range (theta = infty).

However, overdispersion can already matter before this is detected by a 
significance test. Hence, if in doubt, I would simply use an NB model and 
you're on the safe side. And if the NB's estimated theta parameter turns 
out to be extremely large (say beyond 20 or 30), then you can still switch 
back to Poisson if you want.

> 2) I read that if variance is greater than mean we have overdispersion, 
> and a potential way to deal with this is including more independent 
> variables, or family=quasipoisson. Does this distribution have any other 
> requirements or assumptions? What test do I use to see whether 1) or 2) 
> fits better - simply anova(m1,m2)?

quasipoisson yields the same parameter estimates as the poisson, only the 
inference is adjusted appropriately.

> 3) I also read that negative-binomial distribution can be used when 
> overdispersion appears. How do I do this in R?

glm.nb() in "MASS" is one of standard options.

> What is the difference to quasipoisson?

The NB is a likelihood-based model while the quasipoisson is not 
associated with a likelihood (but has the same conditional mean equation).

> 4) Zero-inflated Poisson Regression: I read that using the vuong test
> checks what models fits better.
>> vuong (model.poisson, model.zero.poisson)
> Is that correct?

It's one of the possibilities.

> 5) ats.ucla.edu has a section about zero-inflated Poisson Regressions, and
> test the zeroinflated model (a) against the standard poisson model (b):
>> m.a <- zeroinfl(count ~ child + camper | persons, data = zinb)
>> m.b <- glm(count ~ child + camper, family = poisson, data = zinb)
>> vuong(m.a, m.b)
> I don't understand what the "| persons" part of the first model does, and
> why you can compare these models if. I had expected the regression to be
> the same and just use a different family.

I recommend you read the associated documentation. See 
vignette("countreg", package = "pscl")

For glm.nb() I recommend its accompanying documentation, namely the MASS 
book.

hth,
Z