[Rd] nobs() with glm(family="poisson")

Milan Bouchet-Valat nalimilan at club.fr
Wed Feb 27 19:46:51 CET 2013


I cannot believes nobody cares about this -- or I'm completely wrong and
in that case everybody should rush to put the shame on me... :-p

In the meantime, I have come up with an alternative way of fixing this:
when modeling count data, glm() could allow users to pass a table as the
data argument, and convert it to a data frame using
as.data.frame.table() instead of requiring the user to do it
beforehand[1]. This would become the recommended way of fitting models
for count data, and the fact that a table is passed could be used as the
sign that nobs() should return the sum of cell counts instead of the
number of rows in the data.frame.


Regards


1: gnm already supports this pattern, with the additional advantage that
e.g. fitted(), predict(), residuals() and weights() return an object of
the same dimensions and dimnames as the original table.


Le lundi 18 février 2013 à 12:22 +0100, Milan Bouchet-Valat a écrit :
> Hi!
> 
> The nobs() method for glm objects always returns the number of cases
> with non-null weights in the data, which does not correspond to the
> number of observations for Poisson regression/log-linear models, i.e.
> when family="poisson" or family="quasipoisson".
> 
> This sounds dangerous since nobs() is, as the documentation states,
> primarily aimed at computing the Bayesian information criterion. Raftery
> (1995:20) warned against this:
> > What should n be? Once again, it is best to use the actual number of
> > individuals, i.e. the sum of the cell counts, and not the number of
> > cells (Raftery, 1986a).
> 
> Is there a reason why this should not/cannot be done that way?
> 
> This behavior can be reproduced with with R 3.0.0 from SVN, using the
> example from ?glm:
> counts <- c(18,17,15,20,10,20,25,13,12)
> outcome <- gl(3,1,9)
> treatment <- gl(3,3)
> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
> nobs(glm.D93)
> # 9 == length(counts)
> # Should be 150 == sum(counts)
> 
> FWIW, stats:::nobs.glm is currently defined as:
> nobs.glm <- function (object, ...) 
>     if (!is.null(w <- object$prior.weights)) sum(w != 0) else length(object$residuals)
> 
> 
> Thanks!
> 
> 
> Raftery, Adrian E. 1995. “Bayesian Model Selection in Social Research.”
> Sociological methodology 25:111–96.
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list