[R] Anova and unbalanced designs

Sat Feb 14 18:09:46 CET 2009

Dear Tal,

> -----Original Message-----
> From: Tal Galili [mailto:tal.galili at gmail.com]
> Sent: February-14-09 10:23 AM
> To: John Fox
> Cc: Peter Dalgaard; Nils Skotara; r-help at r-project.org; Michael Friendly
> Subject: Re: [R] Anova and unbalanced designs
> 
> Hello John and other R mailing list members.
> 
> I've been following your discussions regarding the Anova command for the
SS
> type 2/3 repeated measures Anova, and I have a question:
> 
> I found that when I go from using type II to using type III, the summary
> model is suddenly added with an "intercept" term (example in the end of
the
> e-mail). So my question is
> 1) why is this "intercept" term added (in SS type "III" vs the type "II")?

The computational approach taken in Anova() makes it simpler to include the
intercept in the "type-III" tests and not to include it in the "type-II"
tests.

> 2) Can/should this "intercept" term be removed ? (or how should it be
> interpreted ?)

The test for the intercept is rarely of interest. A "type-II" test for the
intercept would test that the unconditional mean of the response is 0; a
"type-III" test for the intercept would test that the constant term in the
full model fit to the data is 0. The latter depends upon the parametrization
of the model (in the case of an ANOVA model, what kind of "contrasts" are
used). You state that the example that you give is taken from ?Anova but
there's a crucial detail that's omitted: The help file only gives the
"type-II" tests; the "type-III" tests are also reasonable here, but they
depend upon having used "contr.sum" (or another set of contrasts that's
orthogonal in the row basis of the model matrix) for the between-subject
factors, treatment and gender. This detail is in the data set:

> OBrienKaiser$gender
 [1] M M M F F M M F F M M M F F F F
attr(,"contrasts")
[1] contr.sum
Levels: F M

> OBrienKaiser$treatment
 [1] control control control control control A       A       A       A
B       B      
[12] B       B       B       B       B      
attr(,"contrasts")
        [,1] [,2]
control   -2    0
A          1   -1
B          1    1
Levels: control A B

With proper contrast coding, the "type-III" test for the intercept tests
that the mean of the cell means (the "grand mean") is 0.

Had the default dummy-coded contrasts (from contr.treatment) been used, the
tests would not have tested reasonable hypotheses. My advice, from the help
file: "Be very careful in formulating the model for type-III tests, or the
hypotheses tested will not make sense."

I hope this helps,
 John

> 
> My purpose is to be able to use the Anova for analyzing an experiment with
a
> 2 between and 3 within factors, where the between factors are not
balanced,
> and the within factors are (that is why I can't use the aov command).
> 
> 
> #---code start
> 
> #---code start
> 
> #---code start
> 
> # (taken from the ?Anova help file)
> 
> phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)),
>     levels=c("pretest", "posttest", "followup"))
> hour <- ordered(rep(1:5, 3))
> idata <- data.frame(phase, hour)
> idata
> mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
>                      post.1, post.2, post.3, post.4, post.5,
>                      fup.1, fup.2, fup.3, fup.4, fup.5) ~
treatment*gender,
>                 data=OBrienKaiser)
> 
> # now we have two options
> # option one is to use type II:
> 
> (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = "II"))
> 
> 
> #output:
> Type II Repeated Measures MANOVA Tests: Pillai test statistic
>                             Df test stat approx F num Df den Df    Pr(>F)
> treatment                    2    0.4809   4.6323      2     10 0.0376868
*
> gender                       1    0.2036   2.5558      1     10 0.1409735
> treatment:gender             2    0.3635   2.8555      2     10 0.1044692
> phase                        1    0.8505  25.6053      2      9 0.0001930
***
> treatment:phase              2    0.6852   2.6056      4     20 0.0667354
.
> gender:phase                 1    0.0431   0.2029      2      9 0.8199968
> treatment:gender:phase       2    0.3106   0.9193      4     20 0.4721498
> hour                         1    0.9347  25.0401      4      7 0.0003043
***
> treatment:hour               2    0.3014   0.3549      8     16 0.9295212
> gender:hour                  1    0.2927   0.7243      4      7 0.6023742
> treatment:gender:hour        2    0.5702   0.7976      8     16 0.6131884
> phase:hour                   1    0.5496   0.4576      8      3 0.8324517
> treatment:phase:hour         2    0.6637   0.2483     16      8 0.9914415
> gender:phase:hour            1    0.6950   0.8547      8      3 0.6202076
> treatment:gender:phase:hour  2    0.7928   0.3283     16      8 0.9723693
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> 
> # option two is to use type III, and then get an added intercept term:
>  (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = "III"))
> 
> 
> # here is the output:
> Type III Repeated Measures MANOVA Tests: Pillai test statistic
>                             Df test stat approx F num Df den Df    Pr(>F)
> (Intercept)                  1     0.967  296.389      1     10 9.241e-09
***
> treatment                    2     0.441    3.940      2     10 0.0547069
.
> gender                       1     0.268    3.659      1     10 0.0848003
.
> treatment:gender             2     0.364    2.855      2     10 0.1044692
> phase                        1     0.814   19.645      2      9 0.0005208
***
> treatment:phase              2     0.696    2.670      4     20 0.0621085
.
> gender:phase                 1     0.066    0.319      2      9 0.7349696
> treatment:gender:phase       2     0.311    0.919      4     20 0.4721498
> hour                         1     0.933   24.315      4      7 0.0003345
***
> treatment:hour               2     0.316    0.376      8     16 0.9183275
> gender:hour                  1     0.339    0.898      4      7 0.5129764
> treatment:gender:hour        2     0.570    0.798      8     16 0.6131884
> phase:hour                   1     0.560    0.478      8      3 0.8202673
> treatment:phase:hour         2     0.662    0.248     16      8 0.9915531
> gender:phase:hour            1     0.712    0.925      8      3 0.5894907
> treatment:gender:phase:hour  2     0.793    0.328     16      8 0.9723693
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> 
> 
> #---code end
> 
> #---code end
> 
> #---code end
> 
> 
> 
> Thanks in advance for your help!
> Tal Galili
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Sun, Jan 25, 2009 at 3:08 AM, John Fox <jfox at mcmaster.ca> wrote:
> 
> 
> 	Dear Peter and Nils,
> 
> 	In my initial message, I stated misleadingly that the contrast
coding
> didn't
> 	matter for the "type-III" tests here since there is just one
> 	between-subjects factor, but that's not right: The between type-III
SS
> is
> 	correct using contr.treatment(), but the within SS is not. As is
> generally
> 	the case, to get reasonable type-III tests (i.e., tests of
reasonable
> 	hypotheses), it's necessary to have contrasts that are orthogonal in
> the
> 	row-basis of the design, such as contr.sum(),  contr.helmert(), or
> 	contr.poly(). The "type-II" tests, however, are insensitive to the
> contrast
> 	parametrization. Anova() always uses an orthogonal parametrization
for
> the
> 	within-subjects design.
> 
> 	The general advice in ?Anova is, "Be very careful in formulating the
> model
> 	for type-III tests, or the hypotheses tested will not make sense."
> 
> 	Thanks, Peter, for pointing this out.
> 
> 
> 	John
> 
> 	------------------------------
> 	John Fox, Professor
> 	Department of Sociology
> 	McMaster University
> 	Hamilton, Ontario, Canada
> 	web: socserv.mcmaster.ca/jfox
> 
> 
> 	> -----Original Message-----
> 
> 	> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
> 	> Sent: January-24-09 6:31 PM
> 	> To: Nils Skotara
> 	> Cc: John Fox; r-help at r-project.org; 'Michael Friendly'
> 	> Subject: Re: [R] Anova and unbalanced designs
> 	>
> 
> 	> Nils Skotara wrote:
> 	> > Dear John,
> 	> >
> 	> > thank you again! You replicated the type III result I got in
SPSS!
> When
> 	I
> 	> > calculate Anova() type II:
> 	> >
> 	> > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> 	> >
> 	> >                     SS num Df Error SS den Df      F  Pr(>F)
> 	> > between         4.8000      1   9.0000      8 4.2667 0.07273 .
> 	> > within          0.2000      1  10.6667      8 0.1500 0.70864
> 	> > between:within  2.1333      1  10.6667      8 1.6000 0.24150
> 	> > ---
> 	> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 	> >
> 	> > I see the exact same values as you had written.
> 	> > However, and now I am really lost, type III (I did not change
> anything
> 	> else)
> 	> > leads to the following:
> 	> >
> 	> > Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
> 	> >
> 	> >                               SS num Df Error SS den Df       F
> 	Pr(>F)
> 	> > (Intercept)               72.000      1    9.000      8 64.0000
> 	4.367e-05
> 	> ***
> 	> > between                    4.800      1    9.000      8  4.2667
> 	0.07273 .
> 	> > as.factor(within)          2.000      1   10.667      8  1.5000
> 	0.25551
> 	> > between:as.factor(within)  2.133      1   10.667      8  1.6000
> 	0.24150
> 	> > ---
> 	> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 	> >
> 	> > How is this possible?
> 	>
> 	> This looks like a contrast parametrization issue: If we look at
the
> 	> per-group mean within-differences and their SE, we get
> 	>
> 	>  > summary(lm(within1-within2~between - 1))
> 	> ..
> 	> Coefficients:
> 	>           Estimate Std. Error t value Pr(>|t|)
> 	> between1  -1.0000     0.8165  -1.225    0.256
> 	> between2   0.3333     0.6667   0.500    0.631
> 	> ..
> 	>  > table(between)
> 	> between
> 	> 1 2
> 	> 4 6
> 	>
> 	> Now, the type II F test is based on weighting the two means as you
> would
> 	> after testing for no interaction
> 	>
> 	>  > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
> 	> [1] 0.1500205
> 	>
> 	> and type III is to weight them as if there had been equal counts
> 	>
> 	>  > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
> 	> [1] 0.400022
> 	>
> 	> However, the result above corresponds to looking at group1 only
> 	>
> 	>  > (-1)^2/(0.8165^2)
> 	> [1] 1.499987
> 	>
> 	> It helps if you choose orhtogonal contrast parametrizations:
> 	>
> 	>  > options(contrasts=c("contr.sum","contr.helmert"))
> 	>  > betweenanova <- lm(values ~ between)> Anova(betweenanova,
> idata=with,
> 	> idesign= ~as.factor(within), type = "III" )
> 	>
> 	> Type III Repeated Measures MANOVA Tests: Pillai test statistic
> 	>                            Df test stat approx F num Df den Df
> Pr(>F)
> 	> (Intercept)                1     0.963  209.067      1      8
5.121e-
> 07
> 	***
> 	> between                    1     0.348    4.267      1      8
> 0.07273 .
> 	> as.factor(within)          1     0.048    0.400      1      8
> 0.54474
> 	> between:as.factor(within)  1     0.167    1.600      1      8
> 0.24150
> 	> ---
> 	> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 	>
> 	>
> 	>
> 	>
> 	> --
> 	>     O__  ---- Peter Dalgaard             Øster Farimagsgade 5,
Entr.B
> 	>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> 	>   (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)
> 35327918
> 	> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45)
> 35327907
> 
> 	______________________________________________
> 	R-help at r-project.org mailing list
> 	https://stat.ethz.ch/mailman/listinfo/r-help
> 	PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> 	and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
> 
> --
> ----------------------------------------------
> 
> 
> My contact information:
> Tal Galili
> Phone number: 972-50-3373767
> FaceBook: Tal Galili
> My Blogs:
> www.talgalili.com
> www.biostatistics.co.il
> 
>