[R] Problem with zero-inflated negative binomial model in sediment river dynamics

Achim Zeileis Achim.Zeileis at uibk.ac.at
Wed Aug 14 12:07:34 CEST 2013


On Tue, 13 Aug 2013, Cade, Brian wrote:

> Lauria:  For historical reasons the logistic regression (binomial with
> logit link) model portion of a zero-inflated count model is usually
> structured to predict the probability of the 0 counts rather than the
> nonzero (>=1) counts so the coefficients will be the negative of what you
> expect based on the count model portion (as in your output).  It is simple
> to interpret the probability of the logistic regression portion as the
> probability of the nonzero counts by just taking the negative of the
> coefficient estimates provided for the probability of the zero counts.

This is a common misinterpretation but not quite correct.

The zero-inflation model is a mixture model of two components: (1) a count 
component (Poisson, NB, ...), and (2) a zero mass component (i.e., zero 
with probability 1). Hence, the observed zeros in the data can come from 
both sources: either they are "random" zeros from component (1) or 
"excess" zeros from component (2).

The binomial zero-inflation part of the model predicts the probability 
that a given observation belongs to component (1). Thus, the probability 
of an "excess zero". But this is _not_ the probability of observing a zero 
in the data (which is larger than the excess zero probability).

If you want a model that first models zero vs. non-zero and second the 
non-zero counts, use the hurdle model. This has exactly the interpretation 
you describe above.

Best,
Z

> Brian
>
> Brian S. Cade, PhD
>
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO  80526-8818
>
> email:  cadeb at usgs.gov <brian_cade at usgs.gov>
> tel:  970 226-9326
>
>
>
> On Tue, Aug 13, 2013 at 9:06 AM, Lauria, Valentina <
> valentina.lauria at nuigalway.ie> wrote:
>
>> Dear All,
>>
>> I am running a negative binomial model in R using the package pscl in oder
>> to estimate bed sediment movements versus river discharge. Currently we
>> have deployed 4 different plates to test if a combination of more than one
>> plate would better describe the sediment movements when the river discharge
>> changes over time.
>>
>> My data are positively skewed and zero-inflated. I did run both
>> zero-inflated Poisson and zero-inflated negative binomial regression and
>> compared them using the VUONG test which showed that the negative binomial
>> works better than a simple zero-inflated Poisson.
>>
>> My models look like:
>>
>>
>> 1) plate1 ~ river discharge
>> 2) (plate 1 + plate 2) ~ river discharge
>> 3) (plate 1 + plate 2 +plate 3) ~ river discharge
>> 4) (plate 1 + plate 2 + plate 3 + plate 4) ~ river discharge
>>
>>
>> My main problem as I am new to these type of models is that I get a
>> different sign for the coefficent of discharge in the output of the
>> zero-inflated negative binomial model (please see below). What does this
>> mean? Also how could I compare the different models (1-4) i.e. what tells
>> me which is performing best? Thank you very much in advance for any
>> comments and suggestions!!
>>
>> Kind Regards,
>> Valentina
>>
>>
>> Call:
>> zeroinfl(formula = plate1 ~ discharge, data = datafit_plates, dist =
>> "negbin", EM = TRUE)
>> Pearson residuals:
>>     Min      1Q  Median      3Q     Max
>> -0.6770 -0.3564 -0.2101 -0.0814 12.3421
>>
>> Count model coefficients (negbin with log link):
>>                          Estimate    Std. Error z value Pr(>|z|)
>> (Intercept)  2.557066     0.036593   69.88   <2e-16 ***
>> discharge    0.064698    0.001983   32.63   <2e-16 ***
>> Log(theta)  -0.775736   0.012451  -62.30   <2e-16 ***
>>
>> Zero-inflation model coefficients (binomial with logit link):
>>                       Estimate    Std. Error     z value    Pr(>|z|)
>> (Intercept)   13.01011    0.22602      57.56   <2e-16 ***
>> discharge    -1.64293    0.03092       -53.14   <2e-16 ***
>> Theta = 0.4604
>> Number of iterations in BFGS optimization: 1
>> Log-likelihood: -6.933e+04 on 5 Df
>>
>>
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list