[R] Type II and III sum of square in Anova (R, car package)

Mark Lyman mark.lyman at gmail.com
Sun Aug 27 07:50:40 CEST 2006


> 1. First of all, more general question. Standard anova() function for lm()
> or aov() models in R implements Type I sum of squares (sequential), which
> is not well suited for unbalanced ANOVA. Therefore it is better to use
> Anova() function from car package, which was programmed by John Fox to use
> Type II and Type III sum of squares. Did I get the point?
> 
> 2. Now more specific question. Type II sum of squares is not well suited
> for unbalanced ANOVA designs too (as stated in STATISTICA help), therefore
> the general rule of thumb is to use Anova() function using Type II SS
> only for balanced ANOVA and Anova() function using Type III SS for
> unbalanced ANOVA? Is this correct interpretation?
> 
> 3. I have found a post from John Fox in which he wrote that Type III SS
> could be misleading in case someone use some contrasts. What is this about?
> Could you please advice, when it is appropriate to use Type II and when
> Type III SS? I do not use contrasts for comparisons, just general ANOVA
> with subsequent Tukey post-hoc comparisons.
 
There are many threads on this list that discuss this issue. Not being a great
statistician myself, I would suggest you read through some of these as a start.
As I understand, the best philosophy with regards to types of sums of squares is
to use the type that tests the hypothesis you want. They were developed as a
convenience to test many of the hypotheses a person might want "automatically,"
and put it into a nice, neat little table. However, with an interactive system
like R it is usually even easier to test a full model vs. a reduced model. That
is if I want to test the significance of an interaction, I would use
anova(lm.fit2, lm.fit1) where lm.fit2 contains the interaction and lm.fit2 does
not. The anova function will return the appropriate F-test. The danger with
worrying about what type of sums of squares to use is that often we do not think
about what hypotheses we are testing and if those hypotheses make sense in our
situation.

Mark Lyman



More information about the R-help mailing list