[Rd] quick question about glm() example

Wed May 29 21:05:23 CEST 2013

On May 29, 2013, at 19:58 , Ben Bolker wrote:

> 
>  I don't have a copy of Dobson (1990) from which the glm.D93 example is
> taken in example("glm"), but I'm strongly suspecting that these are
> made-up data rather than real data; the means of the responses within
> each treatment are _identical_ (equal to 16 2/3), so two of the
> parameters are estimated as being zero (within machine tolerance).  (At
> this moment I don't understand why the means rather than the geometric
> means being identical is what matters ...)

I don't have it to hand either (although I could get to it tomorrow -- a colleague is teaching from it). However, I don't think there's anything particularly fake about the data, but they do seem to come from a designed trial, in which 3 groups of 50 people are assigned different treatments and a 3-level outcome is recorded. 

So in reality, it is a model for three multinomials, but there's a standard "trick" that this has likelihood inference equivalent to that of a multiplicative Poisson model. The likelihood for the Poisson model factorizes into the marginal distribution of the treatment totals and the conditional distribution of the table given the totals. The latter is the set of multinomials. If you ensure a suitable separation of parameters and that the model for the marginals is saturated, you end up with likelihood inference for the conditional distributions. (This may require some pen-and-paper work to become intelligible...)

The ML parameters in a multiplicative Poisson model (lambda_ij=alpha_i*beta_j) are proportional to the row and column means, and in the GLM framework parameters are the log of that, subject to some resolution of the overparametrization issue. 

The outcome parameters are likely of no interest, since they describe log-ratios of frequencies of the outcome categories, so the whole thing is about the goodness of fit, as measured by the

Residual deviance:  5.1291  on 4  degrees of freedom

which is the LRT equivalent of the Pearson Chi-square for the table:

> chisq.test(matrix(d.AD$counts,3))

	Pearson's Chi-squared test

data:  matrix(d.AD$counts, 3)
X-squared = 5.1732, df = 4, p-value = 0.27

In fact, Rao's score test for an interaction term is exactly the Pearson Chisquare:

> anova(glm.D93, update(glm.D93,~outcome*treatment), test="Rao")
Analysis of Deviance Table

Model 1: counts ~ outcome + treatment
Model 2: counts ~ outcome + treatment + outcome:treatment
  Resid. Df Resid. Dev Df Deviance    Rao Pr(>Chi)
1         4     5.1291                            
2         0     0.0000  4   5.1291 5.1732     0.27

> 
>  This therefore feels like a somewhat strange (i.e. non-generic)
> example to use (although I know it's been that way for a long time).
> 
>  Perhaps more importantly, the example illustrates the use of glm()
> without a data= argument, which I think should not generally be
> encouraged.  I would prefer to see the example written as:
> 
> d.AD <- data.frame(
> counts=c(18,17,15,20,10,20,25,13,12),
> outcome=gl(3,1,9),
> treatment=gl(3,3))
> 
> print(d.AD)
> 
> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson(),
>       data=d.AD)
> 
>  Thoughts?
> 
>  Ben Bolker
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com