[R] Specify order of groups negative binomial (glm.nb)

Katharine Miller - NOAA Federal katharine.miller at noaa.gov
Fri Feb 26 02:44:31 CET 2016


Yes.  relevel looks good!  I will give that a try.

Thanks!

Katharine B. Miller, PhD
Research Fisheries Biologist
NMFS, Alaska Fisheries Science Center
17109 Lena Loop Rd
Juneau, AK 99801
(907) 789-6410
(907) 789-6094 (fax)




On Thu, Feb 25, 2016 at 2:12 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:

> Ah yes. Forgot about relevel().  That would  be simpler.
>
> Cheers,
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Feb 25, 2016 at 1:19 PM, Marc Schwartz <marc_schwartz at me.com>
> wrote:
> > Hi,
> >
> > Well...as I understand the question:
> >
> > If Katharine wants to use treatment contrasts, which are the default,
> the easiest thing to do may be use to ?relevel to specify the reference
> level for the IV factor.
> >
> > However, as Bert notes, there are other contrasts that can be used,
> which would affect the interpretation of the comparisons of the factor
> levels and the help pages that he references would cover a high level
> review of the options, with more detail provided in the reference therein.
> >
> > Also, if there is an excess of zeroes, you may need to consider the use
> of a zero inflated model and the 'pscl' package on CRAN is worthy of
> consideration, along with its vignette on count regression models:
> >
> > https://cran.r-project.org/web/packages/pscl/
> > https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf
> >
> > The ?vuong test in that package can also be helpful for comparing
> zero-inflated models with non zero-inflated models.
> >
> > Regards,
> >
> > Marc Schwartz
> >
> >
> >> On Feb 25, 2016, at 2:48 PM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
> >>
> >> You can re-set the contrasts for the factor, though whether this is
> >> "easier" is a matter of personal preference.
> >>
> >> See ?C or ?contrasts, which I understand to be alternative ways of
> >> doing the same thing (and would appreciate correction is this is
> >> wrong). Or this can be done through the "contrasts" argument of
> >> glm.nb.
> >>
> >> Cheers,
> >> Bert
> >>
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> >> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Thu, Feb 25, 2016 at 10:53 AM, Katharine Miller - NOAA Federal
> >> <katharine.miller at noaa.gov> wrote:
> >>> Hi,
> >>> I am using the glm.nb function to evaluate differences in the catch of
> >>> individual species of fish across three river tributaries.  The
> dependent
> >>> variable is catch per unit effort (CPUE), and the independent variable
> is
> >>> the the tributary (Trib_cat).  CPUE is derived from the fish counts
> divided
> >>> by the effort, so the response is not a count per se, but I think the
> >>> negative binomial is appropriate because of the large numbers of zeros
> in
> >>> the dependent variable. Trib_cat is a column with 1, 2, or 3 depending
> on
> >>> which tributary it is representing.
> >>>
> >>> I have the following code:
> >>>
> >>> dataS<-read.table("OtherSpecies.txt", sep="", header=T)
> >>>
> >>> dataS <- within(dataS, {
> >>>  Trib_cat <- factor(Trib_cat, levels = 1:3, labels = c("MM", "NM",
> "SM"))
> >>>  Year<-factor(Year)
> >>> })
> >>>
> >>> ## separate out the species of interest
> >>> coregonid<-subset(dataS, Spec_age=="Coregonid")
> >>>
> >>> fit.CT<-glm.nb(CPUE ~ Trib_cat, data=coregonid, link = log)
> >>> summary(fit.CT)
> >>>
> >>> The result comes out like this:
> >>> Call:
> >>> glm.nb(formula = CPUE ~ Trib_cat, data = coregonid4, init.theta =
> >>> 0.1723775759,
> >>>    link = log)
> >>>
> >>> Deviance Residuals:
> >>>    Min       1Q   Median       3Q      Max
> >>> -0.9239  -0.8055  -0.6687  -0.6286   2.9020
> >>>
> >>> Coefficients:
> >>>            Estimate Std. Error z value Pr(>|z|)
> >>> (Intercept)  -0.7805     0.2497  -3.125  0.00178 **
> >>> Trib_catNM   -0.2140     0.3485  -0.614  0.53921
> >>> Trib_catSM    0.7393     0.3296   2.243  0.02488 *
> >>> ---
> >>>
> >>> This gives me the difference in CPUE between the NM and SM tributaries
> >>> compared to the MM tributary  (acting here as the reference group).
> What I
> >>> need is to compare all of the tributaries with all the others - so I
> need
> >>> to run the model twice with a different reference group on the second
> run.
> >>>  I can do this by changing the numbering of the Trib_cat field  in the
> >>> underlying database (e.g. changing MM from 1 to 3, SM from 2 to 1,
> etc) and
> >>> re-running the model, but I, as I have a number of species to do this
> for,
> >>> I was wondering if there was an easier way to specify which group to
> use as
> >>> the reference group when calling the model.
> >>>
> >>> Thanks for any help.
> >
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list