[Rd] scale factors/overdispersion in GLM: possible bug?

Ben Bolker bolker@zoo.ufl.edu
Wed, 19 Apr 2000 10:46:45 -0400 (EDT)

  I've been poking around with GLMs (on which I am *not* an expert) on
behalf of a student, particularly binomial (standard logit link) nested
models with overdispersion.

  I have one possible bug to report (but I'm not confident enough to be
*sure* it's a bug); one comment on the general inconsistency that seems to
afflict the various functions for dealing with overdispersion in GLMs
(anova.glm, drop1.glm, summary.glm); and one statistical query that maybe
someone would answer if they're feeling generous.

  1.  possible bug:
  in drop1.glm() with scale != 0, R seems to divide by the dispersion
parameter twice:

 first (if family != "gaussian") it sets

 loglik <- dev/dispersion

 then (if test == "Chisq") it calculates

     dev <- loglik - loglik[1]
     dev[nas] <- 1 - pchisq(dev[nas]/dispersion, aod$Df[nas])

(in addition, I suppose this could now be just 
pchisq(...,lower.tail=FALSE) for greater precision)

  Is this a bug or am I missing something?

  2.  There seems to be a fair amount of variability in how drop1.glm,
anova.glm, stat.anova, summary.glm deal with dispersion parameters:
  summary.glm() has an optional parameter called "dispersion"
  stat.anova() has a parameter called "scale", which it doesn't even use
if test=="Chisq"
  anova.glm calls stat.anova() with an automatically calculated scale
parameter (sum(object$weights*object$residuals^2)/object$df.residual), not
allowing a user-defined scale
  drop1.glm() has an optional parameter called "scale"

  I would mess around and try to fix these myself, but (1) there seem to
be some design decisions to make here, (2) I'm not sufficiently sure of
myself to want to break things.

  3. There seems to be a certain variety of advice around in how to deal
with overdispersion in GLMs.  One could (1) calculate the scale parameter
from the residuals (either residual deviance/residual df or Pearson
chi-sq/residual df) and feed that back into the analysis; (2) use
quasi-likelihood with logit link and variance mu(1-mu); (3) use a form of
F-test, as suggested by Crawley (which I guess has something to do with
the fact that the scale parameter is estimated, not really known).  Are
these equivalent, or approximately equivalent?  Does anyone have a
favorite reference on the subject?  (I've been looking at V&R3, Crawley's
"GLIM for Ecologists", and Lindsey's GLM book -- I haven't acquired Dobson
or McCullagh and Nelder yet).

r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch