[Rd] [RfC] Family dispersion

Luis Carvalho lexcarvalho at gmail.com
Thu Jun 2 04:25:28 CEST 2016


I'd like to hear your opinion about the following proposal to make the
computation of dispersion in GLMs more flexible. Dispersion is used in
summary.glm; the relevant code chunk with the dispersion calculation is listed
below (from glm.R):

summary.glm <- function(object, dispersion = NULL,
			correlation = FALSE, symbolic.cor = FALSE, ...)
  est.disp <- FALSE
  df.r <- object$df.residual
  if(is.null(dispersion))	# calculate dispersion if needed
    dispersion <-
      if(object$family$family %in% c("poisson", "binomial"))  1
      else if(df.r > 0) {
        est.disp <- TRUE
          warning("observations with zero weight not used for calculating dispersion")
        sum((object$weights*object$residuals^2)[object$weights > 0])/ df.r
      } else {
        est.disp <- TRUE
  # ...

Many exponential families have unit dispersion, or can be cast to have unit
dispersion, e.g. hypergeometric, negative binomial, and so on. However,
summary.glm only assigns unit dispersion to Poisson and binomial families, as
the code above indicates. My suggestion is to make this check more general by
having a 'dispersion' slot in the family class; for instance, we would have
poisson(...)$dispersion = 1 and binomial(...)$dispersion = 1. The updated
summary.glm would be:

default.dispersion <- function (object, ...) {
  df.r <- object$df.residual
  if (df.r > 0) {
    if (any(object$weights == 0))
      warning("observations with zero weight not used for calculating dispersion")
    sum((object$weights * object$residuals ^ 2)[object$weights > 0]) / df.r
  else NaN

summary.glm <- function(object, dispersion = default.dispersion,
			correlation = FALSE, symbolic.cor = FALSE, ...)
  if (!is.null(object$family$dispersion)) # use family dispersion?
    dispersion <- object$family$dispersion
  est.disp <- is.function(dispersion)
  dispersion <- if (est.disp) dispersion(object, ...) else dispersion

  df.r <- object$df.residual
  # ... (unchanged code below)

Note that 'dispersion' can be a function taking a glm object or a number
(e.g. 1). Here are some examples:

R> library(MASS)
R> gm <- glm(formula, family=Gamma())
R> summary(gm, dispersion = gamma.dispersion) # ML estimate of dispersion

R> set.dispersion <- function (fam, disp) # update family dispersion
R>   structure(within(unclass(fam), dispersion <- disp), class = "family")
R> gm <- glm(formula, family=set.dispersion(Gamma(), gamma.dispersion))
R> summary(gm) # use family dispersion
R> Exp <- function (...) set.dispersion(Gamma(...), 1)

Thanks in advance for the feedback.


Computers are useless. They can only give you answers.
                -- Pablo Picasso

Luis Carvalho
Associate Professor
Dept. of Mathematics and Statistics
Boston University

More information about the R-devel mailing list