# [R] nested anova not giving expected results

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Mon Apr 15 16:48:45 CEST 2002

Matthew Norton <matthew.norton at umontreal.ca> writes:

> Hello all. This may be a trivially simple question to answer, but I'm a little
> bit stumped with respect to the calculation of the F statistics in nested
> anovas in R. If I understand correctly, the F statistic for the
> among-subgroups but within groups hypothesis is calculated as
> MS_subgroups/MS_error, while the F statistic for the factor is calculated as
> MS_factor/MS_subgroups (I'm getting this from Sokal & Rohlf's _Biometry_).
> However, as I understand the output from R, it calculates the F for the
> factor as MS_factor/MS_error, which can significantly change the results.
>
> As an example, I took the values from Sokal & Rohlf's example on mosquitos,
> which are as follows:
>
>    cage animal length
> 1     1      a   58.5
> 2     1      a   59.5
> 3     1      b   77.8
> 4     1      b   80.9
> 5     1      c   84.0
> 6     1      c   83.6
> 7     1      d   70.1
> 8     1      d   68.3
> 9     2      a   69.8
> 10    2      a   69.8
> 11    2      b   56.0
> 12    2      b   54.5
> 13    2      c   50.7
> 14    2      c   49.3
> 15    2      d   63.8
> 16    2      d   65.8
> 17    3      a   56.6
> 18    3      a   57.5
> 19    3      b   77.8
> 20    3      b   79.2
> 21    3      c   69.9
> 22    3      c   69.2
> 23    3      d   62.1
> 24    3      d   64.5
>
> Using the following R commands, I get this output for a nested anova:
>
> > model<-lm(length~cage/animal)
> > anova(model)
> Analysis of Variance Table
>
> Response: length
>             Df  Sum Sq Mean Sq F value    Pr(>F)
> cage         2  665.68  332.84  255.70 1.452e-10 ***
> cage:animal  9 1720.68  191.19  146.88 6.981e-11 ***
> Residuals   12   15.62    1.30
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> According to the book and my understanding of nested anovas, the F statistic
> for the cage:animal component is correct, but the F statistic for 'cage'
> should be 332.84/191.19, giving a value of 1.741 which is not significant,
> and highly different than 255.70.
>
> Perhaps I've misunderstood, but could someone explain to me what R is doing?

R is doing the same thing as SAS and Genstat and probably others: If
you don't specify that there are multiple error components, it assumes
that there is only one. So you get the decomposition of the sum of
squares with everything compared to the residual.

Effectively, this makes any test for a main effect if it appears in a
significant interaction with another factor. Logically, this makes
sense: You cannot talk about an overall cage effect if it differs
between animals, *unless* you interpret differences between animals as
random.

To get a multistratum analysis try

aov(length~cage+Error(cage:animal))

(Notice that this only works out correctly for balanced designs. In
other cases, you may have to look into using lme().)

--
O__  ---- Peter Dalgaard             Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
(*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._