[R] compare GLM coefficients

Tue Nov 23 15:15:27 CET 2010

Michael Bedward <michael.bedward <at> gmail.com> writes:

> 
> Hello Kayce,
> 
> My (very basic) understanding is that you can't directly compare the
> coefficients across models that have different response variables, nor
> could you use AIC and similar metrics of model goodness of fit.
> Instead, I think you have to carefully define what you mean by "reveal
> similar population trends".
> 
> If you treat the model with the count response as your reference, and
> it predicts (for example) population decline of magnitude X over
> period T, then you can investigate to what extent this same trend is
> retrieved by the presence response model. But the specifics of the
> comparison(s) should be closely tied to the population behaviours /
> syndromes / critical points that you are most interested in. If there
> are multiple behaviours of interest you want to know to what extent
> the presence data perform as well as the count data for each of them.
> 
> That's my general take on the style of the approach. Hopefully others
> here will have more detailed and knowledgable comments for you.
> 
> Michael

  I agree with Michael that it's tricky to compare coefficients from
models with different response variables. In particular, if the responses
don't have the same units then it's hard to know how you could ever
compare them. If the responses *are* in the same units, then you
can extract the coefficients and standard errors  and do a t-test on
the difference (?pt is your friend, although you might have to think
a bit about the possibility of unequal standard errors and what to do
about it).
  A comment: I suspect that using glm.nb() on presence-absence data
doesn't make sense, because a negative binomial is unlikely to fit
binary data.  You probably want to use 

glm(presence~year+visits,family="binomial")

  One possibility (you will have to think about this to decide
whether it makes sense or not) is to fit

glm(presence~year+visits,family=binomial(link="log")

to fit a model to the log-probability rather than the logit-probability.
Then since both responses will be on the log scale (i.e., proportional
changes in the response variable), it *might* make sense to compare
the coefficients.

  You might consider using offsets (see ?offset) rather than
including sampling as an independent variable, if you think that
counts will be strictly proportional to sampling intensity.

> 
> On 23 November 2010 17:20, Kayce anderson <kaycelu <at> gmail.com> wrote:
> > I have a data set of repeated abundance counts over time.  I am
> > investigating whether count data reduced to presence-absence (presence) data
> > will reveal similar population trends.  I am using a negative binomial
> > distribution for the glm (package MASS) because the count data contains many
> > zeros and extreme values.  "count" and "presence" are annual sums for each
> > metric.  I have also included sampling effort (visits) as an independent
> > variable because sampling varies between 29-33 visits per year.  My models
> > are:
> >
> > glm.nb(count ~ year + visits) and
> > glm.nb(presence ~ year + visits)
> >
> > I would like to test whether the coefficients for "year" are significantly
> > different between models.  Please advise me on the best method to make such
> > a comparison.
> >
> > Thank you,
> > Kayce
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help <at> r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
>