[R] Zero inflated negat. binomial model

Thu Feb 4 22:14:13 CET 2010

On Thu, 4 Feb 2010, Luciano La Sala wrote:

> Dear R crew:
>
> I think I am in the right mailing list. I have a very simple dataset consisting of two variables: cestode intensity and chick size (defined as CAPI). Intensity is clearly overdispersed, with way too many zeroes. I'm interested in looking at the association between these two variables, i.e. how well does chick size predict tape intensity?
>
> I fit a zero inflated negat. binomial model using the "pscl" package.
>
> I built my model as follows and got the output below.
>
>> model <- zeroinfl(Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>> model
>
> Call:
> zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>
> Count model coefficients (negbin with log link):
> (Intercept)         CAPI
>   -2.99182      0.06817
> Theta = 0.4528
>
> Zero-inflation model coefficients (binomial with logit link):
> (Intercept)         CAPI
>    12.1364      -0.1572
>
>> summary(model)
>
> Call:
> zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>
> Pearson residuals:
>     Min       1Q   Median       3Q      Max
> -0.62751 -0.38842 -0.21303 -0.06899  7.29566
>
> Count model coefficients (negbin with log link):
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept) -2.99182    3.39555  -0.881   0.3783
> CAPI         0.06817    0.04098   1.664   0.0962 .
> Log(theta)  -0.79222    0.45031  -1.759   0.0785 .
>
> Zero-inflation model coefficients (binomial with logit link):
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept) 12.13636    3.71918   3.263  0.00110 **
> CAPI        -0.15720    0.04989  -3.151  0.00163 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Theta = 0.4528
> Number of iterations in BFGS optimization: 1
> Log-likelihood: -140.2 on 5 Df
>
> QUESTIONS
>
> 1. Is my model adequately specified?

Hard to say from only the output. But given that you only have these two 
variables it seems like a natural model. You could also compare it with 
the corresponding hurdle() model.

In both models, you can look at observed and expected frequencies for the 
counts 0, 1, 2, etc.

> 2. CAPI is included in block 1 of output containing negative binomial 
> regression coefficients the variable, and in block 2 corresponding to 
> the inflation model. Does this make sense? If so...

As I said above: It seems natural to me but I don't have any background 
knowledge in this application.

> 3. How should one interprete these results?

count_CAPI: The mean "Int_Cesto" seems to increase slightly with CAPI but
   not very much.
count_theta: There is clear overdispersion (compared to Poisson).
zero_CAPI: The probability for an inflated zero "Int_Cesto" decreases 
clearly with CAPI.

See
   vignette("countreg", package = "pscl")
for a hands-on introduction to Poisson and negative binomial models with 
and without excess zeros.

hth,
Z

> Thanks in advance!
> LFLS
>
>
>      Yahoo! Cocina
>
> Encontra las mejores recetas con Yahoo! Cocina.
>
>
> http://ar.mujer.yahoo.com/cocina/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>