[R] type III ANOVA for a nested linear model

S Ellison S.Ellison at lgc.co.uk
Thu Jul 12 12:07:38 CEST 2007


The aliasing problem arises in alias(), which is what Anova uses to detect aliasing. It is simply the fact that anova more or less blithely ignores the NA's that makes anova behave apparently more 'sensibly' than Anova.

But like Carsten, I found this difficult to understand. Unordered factors are supposed to be arbitrary. I can understand why A:C behaves differently from A:D (where D<-factor(rep(1:3,6)). A:C generates a 27-level factor with only 18 levels populated; A:D has only nine levels. But it is not so simple.

I was goig to suggest using AC<-factor(A:C) instead of A %in% C as a fix, as this generates a 9-level factor with all levels populated - just as A:D does. But 
> alias(lm(resp~A+AC))
generates an alias, where 
> alias(lm(resp~A+A:D))
does not. 

That seemed to me entirely bizarre. table(A,AC) and table(A,A:D) show identical balance and nesting. Something, somewhere, is expanding the two differently. But what?

model.matrix is the 'culprit', I think. The two model matrices differ; model.matrix(resp~A+A:D) has 9 columns, and model.matrix(resp~A+AC) has eleven.

So now I know what has happened. What I don't understand is why, except that 'that is what model.matrix does with this factor level numbering' (refactoring either AC or A:D via as.numeric finally generates identical - and aliased - behaviour)) . I think I am going to invoke the law of unintended consequences and go find a cold compress.

Steve E

>>> Peter Dalgaard <P.Dalgaard at biostat.ku.dk> 11/07/2007 16:03:13 >>>
>A term C %in% A  (or A/C) is not a _specification_ that C is nested in
>A, it is a _directive_ to include the terms A and C:A. Now, C:A involves
>a term for each combination of A and C, of which many are empty if C is
>strictly coarser than A. This may well be what is confusing Anova().

>In fact, with this (c(1:3,6:11)) coding of C, A:C is completely
>equivalent to C, but if you look at summary(lm(....)) you will see a lot
>of NA coefficients in the A:C case. If you use resp ~ A*B+C, then you
>still get a couple of missing coefficients in the C terms because of
>collinearity with the A terms. (Notice that this is one case where the
>order inside the model formula will matter; C+A*B is not the same.)


*******************************************************************
This email and any attachments are confidential. Any use, co...{{dropped}}



More information about the R-help mailing list