[R] ss's are incorrect from aov with multiple factors (EXAMPLE!)

Sat Jul 12 12:37:20 CEST 2003

John Christie <jc at or.psychology.dal.ca> writes:

> OK, I do see that there is a problem in my first email.  I have
> noticed this with repeated measures designs.  Otherwise, of course,
> there is only one error term for all factors.  But, with repeated
> measures designs this is not the case.
> 
> 
> On Friday, July 11, 2003, at 10:00  PM, Spencer Graves wrote:
> 
> > 	  People tend to get the quickest and most helpful responses
> > when they provide a toy problem that produces what they think are
> > anamolous results
> 
> here is an admittedly poor example with factors a and b and s subjects.
> 
> a<-factor(rep(c(0,1),12))
> b<-factor(rep(c(0,0,1,1),6))
> s<- factor(rep(1:6,each=4))
>   x <- c(49.5, 62.8, 46.8, 57, 59.8, 58.5, 55.5, 56, 62.8, 55.8, 69.5,
> 55, 62, 48.8, 45.5, 44.2, 52, 51.5, 49.8, 48.8, 57.2, 59, 53.2, 56)
> 
> now
> 
> summary(aov(x~a*b+Error(s/(a*b))))
> 
> gives a table of results
> but, if one wanted to generate a confidence interval for factor b one
> needs to reanalyze the results thusly
> 
> ss<-aggregate(x, list(s=s, b=b), mean)
> summary(aov(x~b+Error(s/b), data=ss))
> 
> This yields an error term half the size as that reported for b in the
> combined ANOVA.  I would suggest that the way the ss and MSE are
> reported is erroneous since they should be able to be used to directly
> calculate confidence intervals or make mean comparisons without having
> to collapse and reanalyze for every effect.
> 
> Furthermore, I am guessing that this problem makes it impossible to
> get a correct average MSE that includes the interaction term.  OK, far
> from impossible, but very difficult to verify that the term is correct.
> 
> NOTE  F for b is the same in the first ANOVA and the second.

As far as I can tell, yes, you get different results if you analyse
the original data than if you collapse by taking means over the a
factor, and no, you should not expect otherwise. The various SS in the
full analysis are distance measures in 24-dim space, whereas in the
aggregated analysis you get a distance in 12-space. The relation is
that every value entering in the b and s:b terms will be duplicated in
the former, hence the SS is twice as big. 

This is standard procedure, and R does the same as e.g. Genstat in
this respect. It is also necessary to ensure that the residual MS are
comparable, e.g. that you can test for a significant s:b random effect
by comparing with the residual MS to that of the s:a:b stratum.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907