[R] two way ANOVA with unequal sample sizes

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Tue Oct 16 22:44:40 CEST 2001

```julien claude <claude at isem.univ-montp2.fr> writes:

> Hi,
>
> I am trying a two way anova with unequal sample sizes but results are not
> as expected:
>
> I take the example from Applied Linear Statistical Models (Neter et al.
> pp889-897, 1996)
>
> growth rate		gender	bone development
> 1.4			1		1
> 2.4			1		1
> 2.2			1		1
> 2.4			1		2
> 2.1			2		1
> 1.7			2		1
> 2.5			2		2
> 1.8			2		2
> 2			2		2
> 0.7			3		1
> 1.1			3		1
> 0.5			3		2
> 0.9			3		2
> 1.3			3		2
>
> expected results are
>
> source of variation	SS	df	MS	F
> gender			0.12	1	0.12	0.74
> bone development	4.1897	2	2.0949	12.89**
> interaction		0.0754	2	0.377	0.23
> Error			1.3	8	0.1625
>
> # I use
> aov (growrate ~ gender * bonedevelopment)->m
> summary(m)
>
>    						  Df      Sum Sq    Mean Sq 	F value   Pr(>F)
> as.factor(gender)                     			2 	4.3063  	2.1531 		13.2501
> 0.002891 **
> as.factor(bonedevlopment)            		1 	0.0926  	0.0926  		0.5697
> 0.472022
> as.factor(gender:bonedevlopment)  		 2 	0.0754 	 0.0377 	 	0.2321	 0.798034
> Residuals                            			8 	1.3000 	 0.1625
Ahem. Tab damage detected... and your command and output don't match
up.

The as.factor(gender:bonedevlopment) is playing with fire... You
should calculate factor() of each term. However, it would seem that
you already did manage to convert things to factors or you would have
gotten something to this effect:

> evalq(as.factor(gender:bone.development),d)
[1] 1
Levels:  1
Warning messages:
1: Numerical expression has 14 elements: only the first used in:
gender:bone.development
2: Numerical expression has 14 elements: only the first used in:
gender:bone.development

>
> #if I change the order of factors, results are different
> aov (growrate ~ bonedevelopment * gender)->m
> summary(m)
>
>                                       		Df 	Sum Sq Mean Sq 	F value
> Pr(>F)
> as.factor(bonedevlopment)             	1 	0.0029  0.0029 		 0.0176
> 0.897785
> as.factor(gender)                    		2 	4.3960  2.1980 		13.5262 0.002713 **
> as.factor(gender:bonedevlopment)  	2 	0.0754  0.0377  		0.2321   0.798034
> Residuals                           		8 	1.3000  0.1625
>
> #In the both cases, results for main effects differ from those expected in
> Neter et al.
> However interaction and residuals are well estimated.
> Can anyone help, either I am wrong in the formula, or either is there an
> other problem? Is there a mean to conduct easily the  test as in it is in
> Neter et al. ?
> The same problems occurs with anova(lm(....))?

I don't think we're the ones with the problem... There are various
boneheaded ways in which people try to use to assign some kind of
SumSq to main effects in the presence of interaction, and they are all
wrong - although maybe not very wrong if the unbalance is slight.

The tests *should* depend on the test order, as is most clearly seen
if the predictors are highly collinear.

--
O__  ---- Peter Dalgaard             Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
(*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```