[R] Poisson Regression: questions about tests of assumptions

Mon Oct 15 00:31:51 CEST 2012

Hi Eiko,

This is not a direct response to your question, but I thought you
might find these pages helpful:

On negative binomial:

http://www.ats.ucla.edu/stat/R/dae/nbreg.htm

and zero inflated poisson:

http://www.ats.ucla.edu/stat/R/dae/zipoisson.htm

In general this page lists a variety of different models with examples:

http://www.ats.ucla.edu/stat/dae/

I wrote the code for most of those pages. Some of your questions seema
 little more specific than would be addressed on those general
introductory pages, but if there are particular things you would like
to see done, I am open to hearing about it.

One thing I do want to mention is that for publication or presentation
purposes, graphs of predicted counts can be quite nice for readers---I
tried to show how to do that in all of those pages, and you can even
use bootstrapping (which I gave examples for at least in the zero
inflated models) to get robust confidence intervals.

Cheers,

Josh

On Sun, Oct 14, 2012 at 3:14 PM, Eiko Fried <torvon at gmail.com> wrote:
> Thank you for the detailed answer, that was really helpful.
> I did some excessive reading and calculating in the last hours since your
> reply, and have a few (hopefully much more informed) follow up questions.
>
>
> 1) In the vignette("countreg", package = "pscl"), LLH, AIC and BIC values
> are listed for the models Negative-binomial (NB), Zero-Inflated (ZI), ZI
> NB, Hurdle NB, and Poisson (Standard). And although I found a way to
> determine LLH and DF for all models, BIC & AIC values are not displayed by
> default, neither using the code given in the vignette. How do I calculate
> these? (AIC is given as per default only in 2 models, BIC in none).
>
>
> 2) For the zero-inflated models, the first block of count model
> coefficients is only in the output in order to compare the changes,
> correct? That is, in my results in a paper I would only report the second
> block of (zero-inflation) model coefficients? Or do I misunderstand
> something? I am confused because in their large table to compare
> coefficients, they primarily compare the first block of coefficients (p. 18)
> R> fm <- list("ML-Pois" = fm_pois, "Quasi-Pois" = fm_qpois, "NB" = fm_nbin,
> + "Hurdle-NB" = fm_hurdle, "ZINB" = fm_zinb)
> R> sapply(fm, function(x) coef(x)[1:8])
>
>
> 3)
>
>> There are various formal tests for this, e.g., dispersiontest() in package
>> "AER".
>>
>
> I have to run 9 models - I am testing the influence of several predictors
> on different individual symptoms of a mental disorder, as "counted" in the
> last week (0=never in the last week, to 3 = on all day within the last
> week).
> So I'm regressing the same predictors onto 9 different symptoms in 9
> models.
>
> Dispersiontest() for these 9 models result in 3-4 overdispersed models
> (depending if testing one- or two-sided on p=.05 level), 2 underdispersed
> models, and 4 non-significant models. The by far largest dispersion value
> is 2.1 in a model is not overdispersed according to the test, but that's
> the symptom with 80% zeros, maybe that plays a role here.
>
> Does BN also make sense in underdispersed models?
>
> However, overdispersion can already matter before this is detected by a
>> significance test. Hence, if in doubt, I would simply use an NB model and
>> you're on the safe side. And if the NB's estimated theta parameter turns
>> out to be extremely large (say beyond 20 or 30), then you can still switch
>> back to Poisson if you want.
>
>
> Out of the 9 models, the 3 overdispersed NB models "worked" with theta
> estimates between 0.4 and 8.6. The results look just fine, so I guess NB is
> appropriate here.
>
> 4 other models (including the 2 underdispersed ones) gave warnings (theta
> iteration / alternation limit reached), and although the other values look
> ok (estimates, LLH, AIC), theta estimates are between 1.000 and 11.000.
> Could you recommend me what to do here?
>
> Thanks
> T
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/