[R] Question regarding the discrepancy between count model parameter estimates between "pscl" and "MASS"

Fri Aug 29 18:46:36 CEST 2014

Thank you for your responses. 

Since my previous attempt to manually truncate my DV didn't work, I'm very interested in trying again using the zerotrun() function. However, I attempted to install "countreg" but received the following notification:

        Warning in install.packages :
              unable to access index for repository http://R-Forge.R-project.org/bin/macosx/contrib/3.0

              package ‘countreg’ is available as a source package but not as a binary

         Warning in install.packages :
              package ‘countreg’ is not available (for R version 3.0.3)

I received the same message when attempting to install it in version 3.1.0, and the latest version, 3.1.1. Am I missing something?

Thank you again. I appreciate your input.

-Nick
--------------------------------------------
On Fri, 8/29/14, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:

 Subject: Re: [R] Question regarding the discrepancy between count model parameter estimates between "pscl" and "MASS"
 To: "peter dalgaard" <pdalgd at gmail.com>
 Cc: "Nick Livingston" <nlivingston at ymail.com>, r-help at r-project.org
 Date: Friday, August 29, 2014, 5:26 AM

 On Fri, 29 Aug 2014,
 peter dalgaard wrote:

 >
 I'm no expert on hurdle models, but it seems that you
 are unaware that 
 > the negative binomial
 and the truncated negative binomial are quite 
 > different things.

 Yes. You can replicate the truncated count part
 of the hurdle model with 
 the zerotrunc()
 function from the "countreg" package. The package
 is not 
 yet on CRAN but can be easily
 installed from R-Forge.

 > -pd
 >
 >
 > On 29 Aug 2014, at
 05:57 , Nick Livingston <nlivingston at ymail.com>
 wrote:
 >
 >> I have
 sought consultation online and in person, to no avail. I
 hope someone
 >> on here might have
 some insight. Any feedback would be most welcome.
 >>
 >> I am
 attempting to plot predicted values from a two-component
 hurdle model
 >> (logistic [suicide
 attempt yes/no] and negative binomial count [number of
 >> attempts thereafter]). To do so, I
 estimated each component separately using
 >> glm (MASS). While I am able to
 reproduce hurdle results for the logit
 >> portion in glm, estimates for the
 negative binomial count component are
 >> different.
 >>
 >> Call:
 >>
 hurdle(formula = Suicide. ~ Age + gender + Victimization *
 FamilySupport |
 >> Age + gender +
 Victimization * FamilySupport, dist = "negbin",
 link =
 >> "logit")
 >>
 >> Pearson
 residuals:
 >>     Min   
   1Q  Median      3Q     Max
 >> -0.9816 -0.5187 -0.4094  0.2974 
 5.8820
 >>
 >>
 Count model coefficients (truncated negbin with log
 link):
 >>                   

    Estimate Std. Error z value
 >> Pr(>|z|)
 >>
 (Intercept)                          -0.29150 
   0.33127  -0.880   0.3789
 >> Age                       
               0.17068   
 0.07556   2.259   0.0239
 >> *
 >> gender   

    0.28273   
 0.31614   0.894   0.3712
 >> Victimization               
          1.08405   
 0.18157   5.971 2.36e-09
 >>
 ***
 >> FamilySupport           
           0.33629   
 0.29302   1.148   0.2511
 >> Victimization:FamilySupport -0.96831 
   0.46841  -2.067   0.0387 *
 >> Log(theta)                 
           0.12245   
 0.54102   0.226   0.8209
 >> Zero hurdle model coefficients
 (binomial with logit link):
 >>     

     Estimate Std. Error z value
 >>
 Pr(>|z|)
 >> (Intercept)       

    -0.547051   0.215981  -2.533 
 0.01131
 >> *
 >>
 Age                                 
    -0.154493   0.063994  -2.414
 >> 0.01577 *
 >>
 gender                             
    -0.030942   0.284868  -0.109 
 0.91350
 >> Victimization         

 1.073956   0.338015   3.177 
 0.00149
 >> **
 >>
 FamilySupport                   
    -0.380360   0.247530  -1.537 
 0.12439
 >>
 Victimization\:FamilySupport 
 -0.813329   0.399905  -2.034  0.04197 *
 >> ---
 >> Signif.
 codes:  0 '***' 0.001 '**' 0.01 '*'
 0.05 '.' 0.1 ' ' 1
 >>
 >> Theta: count
 = 1.1303
 >> Number of iterations in
 BFGS optimization: 23
 >>
 Log-likelihood: -374.3 on 25 Df
 >>>
 summary(logistic)
 >>
 >>
 >>
 >>
 >> Call:
 >> glm(formula = SuicideBinary ~ Age +
 gender = Victimization * FamilySupport,
 >> family = "binomial")
 >>
 >> Deviance
 Residuals:
 >>     Min   
    1Q   Median       3Q 
     Max
 >> -1.9948  -0.8470 
 -0.6686   1.1160   2.0805
 >>
 >>
 Coefficients:
 >>             

 Estimate Std. Error z value
 >>
 Pr(>|z|)
 >> (Intercept)       
                   -0.547051   0.215981 
 -2.533  0.01131 *
 >> Age       

 -0.154493   0.063994  -2.414  0.01577
 >> *
 >> gender   

 -0.030942   0.284868  -0.109  0.91350
 >> Victimization               

    1.073956   0.338014   3.177 
 0.00149
 >> **
 >>
 FamilySupport                     
 -0.380360   0.247530  -1.537  0.12439
 >> Victimization:FamilySupport 
 -0.813329   0.399904  -2.034  0.04197 *
 >> ---
 >> Signif.
 codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
 >>
 >> (Dispersion
 parameter for binomial family taken to be 1)
 >>
 >> 
    Null deviance: 452.54  on 359  degrees of
 freedom
 >> Residual deviance: 408.24 
 on 348  degrees of freedom
 >>   (52 observations deleted
 due to missingness)
 >> AIC: 432.24
 >>
 >> Number of
 Fisher Scoring iterations: 4
 >>
 >>> summary(Count1)
 >>
 >>
 >>
 >>
 >>
 >>
 >> Call:
 >>
 glm(formula = NegBinSuicide ~ Age + gender + Victimization *
 FamilySupport,
 >> family =
 negative.binomial(theta = 1.1303))
 >>
 >> Deviance
 Residuals:
 >>     Min   
    1Q   Median       3Q 
     Max
 >> -1.6393  -0.4504 
 -0.1679   0.2350   2.1676
 >>
 >>
 Coefficients:
 >>             

    Estimate Std. Error t value
 >> Pr(>|t|)
 >>
 (Intercept)                           
 0.60820    0.13779   4.414 2.49e-05
 >> ***
 >> Age   
                                   0.08836 
   0.04189   2.109   0.0373
 >> *
 >> gender   
                               0.10983   
 0.17873   0.615   0.5402
 >> Victimization               
           0.73270    0.10776   6.799
 6.82e-10
 >> ***
 >> FamilySupport               
         0.10213   
 0.15979   0.639   0.5241
 >>
 Victimization:FamilySupport   -0.60146   
 0.24532  -2.452   0.0159 *
 >> ---
 >> Signif.
 codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
 >>
 >> (Dispersion
 parameter for Negative Binomial(1.1303) family taken to
 be
 >> 0.4549082)
 >>
 >> 
    Null deviance: 76.159  on 115  degrees of
 freedom
 >> Residual deviance: 35.101 
 on 104  degrees of freedom
 >>   (296 observations deleted
 due to missingness)
 >> AIC: 480.6
 >>
 >> Number of
 Fisher Scoring iterations: 15
 >>
 >>
 >>
 Alternatively, if there is a simpler way to plot hurdle
 regression output, or if anyone is award of another means of
 estimating NB models (I haven't had much luck with vglm
 from VGAM either), I would be happy to hear about that as
 well. I'm currently using the "visreg"
 >> package for plotting.
 >>
 >> Thanks!
 >>
 >>
 >>
 >>
 >>
 >>
 ______________________________________________
 >> R-help at r-project.org
 mailing list
 >> https://stat.ethz.ch/mailman/listinfo/r-help
 >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 >> and provide commented, minimal,
 self-contained, reproducible code.
 >
 > -- 
 > Peter Dalgaard,
 Professor,
 > Center for Statistics,
 Copenhagen Business School
 > Solbjerg
 Plads 3, 2000 Frederiksberg, Denmark
 >
 Phone: (+45)38153501
 > Email: pd.mes at cbs.dk  Priv:
 PDalgd at gmail.com
 >
 >
 ______________________________________________
 > R-help at r-project.org
 mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal,
 self-contained, reproducible code.
 >