[R] Specify order of groups negative binomial (glm.nb)

Thu Feb 25 21:48:19 CET 2016

You can re-set the contrasts for the factor, though whether this is
"easier" is a matter of personal preference.

See ?C or ?contrasts, which I understand to be alternative ways of
doing the same thing (and would appreciate correction is this is
wrong). Or this can be done through the "contrasts" argument of
glm.nb.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Feb 25, 2016 at 10:53 AM, Katharine Miller - NOAA Federal
<katharine.miller at noaa.gov> wrote:
> Hi,
> I am using the glm.nb function to evaluate differences in the catch of
> individual species of fish across three river tributaries.  The dependent
> variable is catch per unit effort (CPUE), and the independent variable is
> the the tributary (Trib_cat).  CPUE is derived from the fish counts divided
> by the effort, so the response is not a count per se, but I think the
> negative binomial is appropriate because of the large numbers of zeros in
> the dependent variable. Trib_cat is a column with 1, 2, or 3 depending on
> which tributary it is representing.
>
> I have the following code:
>
> dataS<-read.table("OtherSpecies.txt", sep="", header=T)
>
> dataS <- within(dataS, {
>   Trib_cat <- factor(Trib_cat, levels = 1:3, labels = c("MM", "NM", "SM"))
>   Year<-factor(Year)
> })
>
> ## separate out the species of interest
> coregonid<-subset(dataS, Spec_age=="Coregonid")
>
> fit.CT<-glm.nb(CPUE ~ Trib_cat, data=coregonid, link = log)
> summary(fit.CT)
>
> The result comes out like this:
> Call:
> glm.nb(formula = CPUE ~ Trib_cat, data = coregonid4, init.theta =
> 0.1723775759,
>     link = log)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -0.9239  -0.8055  -0.6687  -0.6286   2.9020
>
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -0.7805     0.2497  -3.125  0.00178 **
> Trib_catNM   -0.2140     0.3485  -0.614  0.53921
> Trib_catSM    0.7393     0.3296   2.243  0.02488 *
> ---
>
> This gives me the difference in CPUE between the NM and SM tributaries
> compared to the MM tributary  (acting here as the reference group).  What I
> need is to compare all of the tributaries with all the others - so I need
> to run the model twice with a different reference group on the second run.
>   I can do this by changing the numbering of the Trib_cat field  in the
> underlying database (e.g. changing MM from 1 to 3, SM from 2 to 1, etc) and
> re-running the model, but I, as I have a number of species to do this for,
> I was wondering if there was an easier way to specify which group to use as
> the reference group when calling the model.
>
> Thanks for any help.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.