[R] Discrepancy between R and SPSS in 2-way, repeated measures ANOVA

John Maindonald john.maindonald at anu.edu.au
Tue Sep 13 02:48:02 CEST 2005


For the record, it turns out that EXPNO ran from 1 to 20, i.e., it  
identified
subject.

Thus EXPNO/COND parsed into the two error terms (additional to residual)
EXPNO and EXPNO:COND.  This second error term accounts for all
variation between levels of COND; so there is no COND sum of squares.
(In SPSS the fixed effect COND may have taken precedence; I do not
know for sure.)

In R, if this was a complete randomized design, the term Error(EXPO),
or in the mock-up example I gave Error(subj), would be enough on its  
own.

The R implementation can handle error terms akin to Error(REPNO/subj),
but because there are redundant model matrix columns generated by the
REPNO:subj term, complains that the Error() model is singular.

In general, terms of the form a/b should be used only if b is nested  
within a,
i.e.,
REPNO/IndividualWithinBlock
(where IndividualWithinBlock runs from 1 to 4)
not REPNO/subj.
(Either of these cause REPNO to be treated as a blocking factor).

 > xy <- expand.grid(REPNO=letters[1:5], COND=letters[1:4],
+                                    TIME=factor(paste(1:2)))
 > xy$subj <- factor(paste(xy$REPNO, xy$COND, sep=":"))
 > ## Below subj becomes EXPNO
 > xy$COND <- factor(xy$COND)
 > xy$REPNO <- factor(xy$REPNO)
 > xy$y <- rnorm(40)

Plea to those who post such questions to the list:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please Include either a toy data set or, if the actual data set is  
small,
lists of factor values.  If you are happy to make the information  
public,
give the result vector also (this is less important!)  Or you can put  
the
data and, where relevant, your code, on a web site.

Be careful about the use of the word "groups" in an experimental
design context; speak of "treatment groups" if that is the meaning,
or "blocks" if that is what is intended.  I suspect that confusion
between these two contexts in which the word groups is wont to
be used lay behind the use of the EXPNO/COND form of
model formula.

John Maindonald.


On 10 Sep 2005, at 8:00 PM, Larry A Sonna wrote:


> From: "Larry A Sonna" <larry_sonna at hotmail.com>
> Date: 10 September 2005 12:10:06 AM
> To: <r-help at stat.math.ethz.ch>
> Subject: [R] Discrepancy between R and SPSS in 2-way, repeated  
> measures ANOVA
>
>
> Dear R community,
>
> I am trying to resolve a discrepancy between the way SPSS and R  
> handle 2-way, repeated measures ANOVA.
>
> An experiment was performed in which samples were drawn before and  
> after treatment of four groups of subjects (control and disease  
> states 1, 2 and 3).  Each group contained five subjects.  An  
> experimental measurement was performed on each sample to yield a  
> "signal".  The before and after treatment signals for each subject  
> were treated as repeated measures.  We desire to obtain P values  
> for disease state ("CONDITION"), and the interaction between signal  
> over time and disease state ("CONDITION*TIME").
>
> Using SPSS, the following output was obtained:
>                      DF        SumSq (Type 3)    Mean Sq    F  
> value     P=
>
> COND              3                 42861            14287        
> 3.645 0.0355
>
> TIME                1                     473                
> 473       0.175 0.681
>
> COND*TIME     3                     975               325        
> 0.120 0.947
>
> Error                16                43219             2701
>
>
>
> By contrast, using the following R command:
>
> summary(aov(SIGNAL~(COND+TIME+COND*TIME)+Error(EXPNO/COND),  
> Type="III"))
>
> the output was as follows:
>
>                  Df     Sum Sq     Mean Sq     F value  Pr(>F)
>
> COND          3          26516       8839      3.2517     0.03651 *
>
> TIME            1            473         473      0.1739     0.67986
>
> COND:TIME  3            975         325      0.1195     0.94785
>
> Residuals     28        76107      2718
>
>
>
> I don't understand why the two results are discrepant.  In  
> particular, I'm not sure why R is yielding 28 DF for the residuals  
> whereas SPSS only yields 16.  Can anyone help?

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.




More information about the R-help mailing list