[R] Question regarding the discrepancy between count model parameter estimates between "pscl" and "MASS"

peter dalgaard pdalgd at gmail.com
Fri Aug 29 13:13:35 CEST 2014


I'm no expert on hurdle models, but it seems that you are unaware that the negative binomial and the truncated negative binomial are quite different things.

-pd


On 29 Aug 2014, at 05:57 , Nick Livingston <nlivingston at ymail.com> wrote:

> I have sought consultation online and in person, to no avail. I hope someone
> on here might have some insight. Any feedback would be most welcome.
> 
> I am attempting to plot predicted values from a two-component hurdle model
> (logistic [suicide attempt yes/no] and negative binomial count [number of
> attempts thereafter]). To do so, I estimated each component separately using
> glm (MASS). While I am able to reproduce hurdle results for the logit
> portion in glm, estimates for the negative binomial count component are
> different.
> 
> Call:
> hurdle(formula = Suicide. ~ Age + gender + Victimization * FamilySupport |
> Age + gender + Victimization * FamilySupport, dist = "negbin", link =
> "logit")
> 
> Pearson residuals:
>     Min      1Q  Median      3Q     Max
> -0.9816 -0.5187 -0.4094  0.2974  5.8820
> 
> Count model coefficients (truncated negbin with log link):
>                                                 Estimate Std. Error z value
> Pr(>|z|)   
> (Intercept)                          -0.29150    0.33127  -0.880   0.3789   
> Age                                      0.17068    0.07556   2.259   0.0239
> *
> gender                                 0.28273    0.31614   0.894   0.3712   
> Victimization                         1.08405    0.18157   5.971 2.36e-09
> ***
> FamilySupport                      0.33629    0.29302   1.148   0.2511   
> Victimization:FamilySupport -0.96831    0.46841  -2.067   0.0387 *
> Log(theta)                            0.12245    0.54102   0.226   0.8209   
> Zero hurdle model coefficients (binomial with logit link):
>                                                  Estimate Std. Error z value
> Pr(>|z|)   
> (Intercept)                           -0.547051   0.215981  -2.533  0.01131
> *
> Age                                     -0.154493   0.063994  -2.414
> 0.01577 *
> gender                                 -0.030942   0.284868  -0.109  0.91350     
> Victimization                          1.073956   0.338015   3.177  0.00149
> **
> FamilySupport                       -0.380360   0.247530  -1.537  0.12439   
> Victimization\:FamilySupport  -0.813329   0.399905  -2.034  0.04197 *
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> Theta: count = 1.1303
> Number of iterations in BFGS optimization: 23
> Log-likelihood: -374.3 on 25 Df
>> summary(logistic)
> 
> 
> 
> 
> Call:
> glm(formula = SuicideBinary ~ Age + gender = Victimization * FamilySupport,
> family = "binomial")
> 
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -1.9948  -0.8470  -0.6686   1.1160   2.0805
> 
> Coefficients:
>                                                  Estimate Std. Error z value
> Pr(>|z|)   
> (Intercept)                          -0.547051   0.215981  -2.533  0.01131 *
> Age                                    -0.154493   0.063994  -2.414  0.01577
> *
> gender                                -0.030942   0.284868  -0.109  0.91350   
> Victimization                         1.073956   0.338014   3.177  0.00149
> **
> FamilySupport                      -0.380360   0.247530  -1.537  0.12439   
> Victimization:FamilySupport  -0.813329   0.399904  -2.034  0.04197 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 452.54  on 359  degrees of freedom
> Residual deviance: 408.24  on 348  degrees of freedom
>   (52 observations deleted due to missingness)
> AIC: 432.24
> 
> Number of Fisher Scoring iterations: 4
> 
>> summary(Count1)
> 
> 
> 
> 
> 
> 
> Call:
> glm(formula = NegBinSuicide ~ Age + gender + Victimization * FamilySupport,
> family = negative.binomial(theta = 1.1303))
> 
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -1.6393  -0.4504  -0.1679   0.2350   2.1676
> 
> Coefficients:
>                                                 Estimate Std. Error t value
> Pr(>|t|)   
> (Intercept)                            0.60820    0.13779   4.414 2.49e-05
> ***
> Age                                      0.08836    0.04189   2.109   0.0373
> *
> gender                                  0.10983    0.17873   0.615   0.5402   
> Victimization                          0.73270    0.10776   6.799 6.82e-10
> ***
> FamilySupport                        0.10213    0.15979   0.639   0.5241   
> Victimization:FamilySupport   -0.60146    0.24532  -2.452   0.0159 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> (Dispersion parameter for Negative Binomial(1.1303) family taken to be
> 0.4549082)
> 
>     Null deviance: 76.159  on 115  degrees of freedom
> Residual deviance: 35.101  on 104  degrees of freedom
>   (296 observations deleted due to missingness)
> AIC: 480.6
> 
> Number of Fisher Scoring iterations: 15
> 
> 
> Alternatively, if there is a simpler way to plot hurdle regression output, or if anyone is award of another means of estimating NB models (I haven't had much luck with vglm from VGAM either), I would be happy to hear about that as well. I'm currently using the "visreg"
> package for plotting.
> 
> Thanks!
> 
>    
> 
>     
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list