[Rd] quick question about glm() example

Ben Bolker bbolker at gmail.com
Wed May 29 22:09:09 CEST 2013


On 13-05-29 03:05 PM, peter dalgaard wrote:
> 
> On May 29, 2013, at 19:58 , Ben Bolker wrote:
> 
>> 
>> I don't have a copy of Dobson (1990) from which the glm.D93 example
>> is taken in example("glm"), but I'm strongly suspecting that these
>> are made-up data rather than real data; the means of the responses
>> within each treatment are _identical_ (equal to 16 2/3), so two of
>> the parameters are estimated as being zero (within machine
>> tolerance).  (At this moment I don't understand why the means
>> rather than the geometric means being identical is what matters
>> ...)
> 
> I don't have it to hand either (although I could get to it tomorrow
> -- a colleague is teaching from it). However, I don't think there's
> anything particularly fake about the data, but they do seem to come
> from a designed trial, in which 3 groups of 50 people are assigned
> different treatments and a 3-level outcome is recorded.
> 
> So in reality, it is a model for three multinomials, but there's a
> standard "trick" that this has likelihood inference equivalent to
> that of a multiplicative Poisson model. The likelihood for the
> Poisson model factorizes into the marginal distribution of the
> treatment totals and the conditional distribution of the table given
> the totals. The latter is the set of multinomials. If you ensure a
> suitable separation of parameters and that the model for the
> marginals is saturated, you end up with likelihood inference for the
> conditional distributions. (This may require some pen-and-paper work
> to become intelligible...)
> 
> The ML parameters in a multiplicative Poisson model
> (lambda_ij=alpha_i*beta_j) are proportional to the row and column
> means, and in the GLM framework parameters are the log of that,
> subject to some resolution of the overparametrization issue.
> 
> The outcome parameters are likely of no interest, since they describe
> log-ratios of frequencies of the outcome categories, so the whole
> thing is about the goodness of fit, as measured by the
> 
> Residual deviance:  5.1291  on 4  degrees of freedom
> 
> which is the LRT equivalent of the Pearson Chi-square for the table:
> 
>> chisq.test(matrix(d.AD$counts,3))
> 
> Pearson's Chi-squared test
> 
> data:  matrix(d.AD$counts, 3) X-squared = 5.1732, df = 4, p-value =
> 0.27
> 
> In fact, Rao's score test for an interaction term is exactly the
> Pearson Chisquare:
> 
>> anova(glm.D93, update(glm.D93,~outcome*treatment), test="Rao")
> Analysis of Deviance Table
> 
> Model 1: counts ~ outcome + treatment Model 2: counts ~ outcome +
> treatment + outcome:treatment Resid. Df Resid. Dev Df Deviance    Rao
> Pr(>Chi) 1         4     5.1291 2         0     0.0000  4   5.1291
> 5.1732     0.27

  Thanks, that's extremely helpful.  The example still seems slightly
non-generic as a very first example to use with ?glm, but it makes more
sense now. [My (increasingly tiny) nit-picks would be that
anova(glm.D93) and summary(glm.D93) are presented without comment,
implying that they would be sensible things to try in this case, rather
than the appropriate statistical tests, which would be the Rao test
above or the LRT equivalent:
anova(glm.D93,update(glm.D93,~outcome*treatment),test="Chisq").  I would
say one should try to pick an example where a simple application of the
standard accessors (anova(), summary(), drop1()) would be more easily
interpretable.]

  Any opinions about the use or non-use of the data= argument?

 Ben Bolker


> 
> 
> 
>> 
>> This therefore feels like a somewhat strange (i.e. non-generic) 
>> example to use (although I know it's been that way for a long
>> time).
>> 
>> Perhaps more importantly, the example illustrates the use of glm() 
>> without a data= argument, which I think should not generally be 
>> encouraged.  I would prefer to see the example written as:
>> 
>> d.AD <- data.frame( counts=c(18,17,15,20,10,20,25,13,12), 
>> outcome=gl(3,1,9), treatment=gl(3,3))
>> 
>> print(d.AD)
>> 
>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson(), 
>> data=d.AD)
>> 
>> Thoughts?
>> 
>> Ben Bolker
>> 
>> ______________________________________________ 
>> R-devel at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list