[R] type III ANOVA for a nested linear model

Wed Jul 11 17:03:13 CEST 2007

Carsten Jaeger wrote:
> Hello Peter,
>
> thanks for your help. I'm quite sure that I specified the right model.
> Factor C is indeed nested within factor A. I think you were confused by
> the numbering of C (1..11), and it is easier to understand when I code
> it as you suggested (1,2,3 within each level of A, as in mydata1 [see
> below]). However, it does not matter which numbering I choose for
> carrying the analysis, as
>
> anova(lm(resp ~ A * B + (C %in% A), mydata))
> anova(lm(resp ~ A * B + (C %in% A), mydata1))
>
> both give the same results (as at least I had expected because of the
> nesting).
>
> However, I found that Anova() from the car package only accepts the
> second version. So,
>
> Anova(lm(resp ~ A * B + (C %in% A), mydata)) does not work (giving an
> error) but
> Anova(lm(resp ~ A * B + (C %in% A), mydata1)) does.
>
> This behaviour is rather confusing, or is there anything I'm missing?
>
>   
You're not listening to what I told you!

A term C %in% A  (or A/C) is not a _specification_ that C is nested in
A, it is a _directive_ to include the terms A and C:A. Now, C:A involves
a term for each combination of A and C, of which many are empty if C is
strictly coarser than A. This may well be what is confusing Anova().

In fact, with this (c(1:3,6:11)) coding of C, A:C is completely
equivalent to C, but if you look at summary(lm(....)) you will see a lot
of NA coefficients in the A:C case. If you use resp ~ A*B+C, then you
still get a couple of missing coefficients in the C terms because of
collinearity with the A terms. (Notice that this is one case where the
order inside the model formula will matter; C+A*B is not the same.)

Whether you'd want C as a random factor is a different matter. It is
often the natural model if C is "subject" and A is "group". Let's assume
that this is the case: In an ordinary linear model, you can test whether
you can remove C (or A:C) , which implies that all subjects in the same
group have the same level of the response. In your case, the hypothesis
is accepted, but the F statistic is around 3 (on (6, 6) DF) , which
suggests that there might be some variation of subjects within groups.
In a mixed-effects model, you assume that this variation exists and
therefore you use the SSD for C as the denominator when testing A, which
is arguably safer than pooling it with the somewhat smaller residual SSD.

> Thanks for your help again, 
>
> Carsten
>
>
> R> mydata
>       A     B      C resp
> 1     1     1      1 34.12
> 2     1     1      2 32.45
> 3     1     1      3 44.55
> 4     1     2      1 20.88
> 5     1     2      2 22.32
> 6     1     2      3 27.71
> 7     2     1      6 38.20
> 8     2     1      7 31.62
> 9     2     1      8 38.71
> 10    2     2      6 18.93
> 11    2     2      7 20.57
> 12    2     2      8 31.55
> 13    3     1      9 40.81
> 14    3     1     10 42.23
> 15    3     1     11 41.26
> 16    3     2      9 28.41
> 17    3     2     10 24.07
> 18    3     2     11 21.16
>
> R> mydata1
>       A     B      C resp
> 1     1     1      1 34.12
> 2     1     1      2 32.45
> 3     1     1      3 44.55
> 4     1     2      1 20.88
> 5     1     2      2 22.32
> 6     1     2      3 27.71
> 7     2     1      1 38.20
> 8     2     1      2 31.62
> 9     2     1      3 38.71
> 10    2     2      1 18.93
> 11    2     2      2 20.57
> 12    2     2      3 31.55
> 13    3     1      1 40.81
> 14    3     1      2 42.23
> 15    3     1      3 41.26
> 16    3     2      1 28.41
> 17    3     2      2 24.07
> 18    3     2      3 21.16
>
>
> On Tue, 2007-07-10 at 13:54 +0200, Peter Dalgaard wrote:
>   
>> Carsten Jaeger wrote:
>>     
>>> Hello,
>>>
>>> is it possible to obtain type III sums of squares for a nested model as
>>> in the following:
>>>
>>> lmod <- lm(resp ~ A * B + (C %in% A), mydata))
>>>
>>> I have tried
>>>
>>> library(car)
>>> Anova(lmod, type="III")
>>>
>>> but this gives me an error (and I also understand from the documentation
>>> of Anova as well as from a previous request
>>> (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/64477.html) that it is
>>> not possible to specify nested models with car's Anova).
>>>
>>> anova(lmod) works, of course.
>>>
>>> My data (given below) is balanced so I expect the results to be similar
>>> for both type I and type III sums of squares. But are they *exactly* the
>>> same? The editor of the journal which I'm sending my manuscript to
>>> requests what he calls "conventional" type III tests and I'm not sure if
>>>   
>>> can convince him to accept my type I analysis.
>>>       
>> In balanced designs, type I-IV SSD's are all identical. However, I don't think the model does what I think you think it does. 
>>
>> Notice that "nesting" is used with two diferent meanings, in R it would be that the codings of C only makes sense within levels of A - e.g. if they were numbered 1:3 within each group, but with C==1 when A==1 having nothing to do with C==1 when A==2.  SAS does something. er. else...
>>
>> What I think you want is a model where C is a random terms so that main effects of A can be tested, like in
>>
>>     
>>> summary(aov(resp ~ A * B + Error(C), dd))
>>>       
>> Error: C
>>           Df  Sum Sq Mean Sq F value Pr(>F)
>> A          2  33.123  16.562  0.4981 0.6308
>> Residuals  6 199.501  33.250
>>
>> Error: Within
>>           Df Sum Sq Mean Sq F value   Pr(>F)
>> B          1 915.21  915.21 83.7846 9.57e-05 ***
>> A:B        2  16.13    8.07  0.7384   0.5168
>> Residuals  6  65.54   10.92
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>>
>> (This is essentially the same structure as Martin Bleichner had earlier today, also @web.de. What is this? an epidemic? ;-))
>>
>>
>>     
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907