[R] Specify order of groups negative binomial (glm.nb)

Thu Feb 25 22:19:40 CET 2016

Hi,

Well...as I understand the question:

If Katharine wants to use treatment contrasts, which are the default, the easiest thing to do may be use to ?relevel to specify the reference level for the IV factor.

However, as Bert notes, there are other contrasts that can be used, which would affect the interpretation of the comparisons of the factor levels and the help pages that he references would cover a high level review of the options, with more detail provided in the reference therein.

Also, if there is an excess of zeroes, you may need to consider the use of a zero inflated model and the 'pscl' package on CRAN is worthy of consideration, along with its vignette on count regression models:

https://cran.r-project.org/web/packages/pscl/
https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf

The ?vuong test in that package can also be helpful for comparing zero-inflated models with non zero-inflated models.

Regards,

Marc Schwartz

> On Feb 25, 2016, at 2:48 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> 
> You can re-set the contrasts for the factor, though whether this is
> "easier" is a matter of personal preference.
> 
> See ?C or ?contrasts, which I understand to be alternative ways of
> doing the same thing (and would appreciate correction is this is
> wrong). Or this can be done through the "contrasts" argument of
> glm.nb.
> 
> Cheers,
> Bert
> 
> 
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Thu, Feb 25, 2016 at 10:53 AM, Katharine Miller - NOAA Federal
> <katharine.miller at noaa.gov> wrote:
>> Hi,
>> I am using the glm.nb function to evaluate differences in the catch of
>> individual species of fish across three river tributaries.  The dependent
>> variable is catch per unit effort (CPUE), and the independent variable is
>> the the tributary (Trib_cat).  CPUE is derived from the fish counts divided
>> by the effort, so the response is not a count per se, but I think the
>> negative binomial is appropriate because of the large numbers of zeros in
>> the dependent variable. Trib_cat is a column with 1, 2, or 3 depending on
>> which tributary it is representing.
>> 
>> I have the following code:
>> 
>> dataS<-read.table("OtherSpecies.txt", sep="", header=T)
>> 
>> dataS <- within(dataS, {
>>  Trib_cat <- factor(Trib_cat, levels = 1:3, labels = c("MM", "NM", "SM"))
>>  Year<-factor(Year)
>> })
>> 
>> ## separate out the species of interest
>> coregonid<-subset(dataS, Spec_age=="Coregonid")
>> 
>> fit.CT<-glm.nb(CPUE ~ Trib_cat, data=coregonid, link = log)
>> summary(fit.CT)
>> 
>> The result comes out like this:
>> Call:
>> glm.nb(formula = CPUE ~ Trib_cat, data = coregonid4, init.theta =
>> 0.1723775759,
>>    link = log)
>> 
>> Deviance Residuals:
>>    Min       1Q   Median       3Q      Max
>> -0.9239  -0.8055  -0.6687  -0.6286   2.9020
>> 
>> Coefficients:
>>            Estimate Std. Error z value Pr(>|z|)
>> (Intercept)  -0.7805     0.2497  -3.125  0.00178 **
>> Trib_catNM   -0.2140     0.3485  -0.614  0.53921
>> Trib_catSM    0.7393     0.3296   2.243  0.02488 *
>> ---
>> 
>> This gives me the difference in CPUE between the NM and SM tributaries
>> compared to the MM tributary  (acting here as the reference group).  What I
>> need is to compare all of the tributaries with all the others - so I need
>> to run the model twice with a different reference group on the second run.
>>  I can do this by changing the numbering of the Trib_cat field  in the
>> underlying database (e.g. changing MM from 1 to 3, SM from 2 to 1, etc) and
>> re-running the model, but I, as I have a number of species to do this for,
>> I was wondering if there was an easier way to specify which group to use as
>> the reference group when calling the model.
>> 
>> Thanks for any help.