[R] type III Sum Sq in ANOVA table - Howto?

Bill.Venables@CMIS.CSIRO.AU Bill.Venables at CMIS.CSIRO.AU
Fri Mar 7 08:46:11 CET 2003


Having sounded off on this issue so frequently (and furiously) in the past
it is perhaps de rigeur for me to say something here...  I'm older and
calmer now, though.

Suppose we have:

	fm <- aov(y ~ x + A*B, data)

Then

	dropterm(fm, test = "F")

Will get you the appropriate information when excluding the *Marginal*
terms, one at a time, from the model, i.e. for X and A:B.  It's not a bug
that nothing else happens automatically.

If you want sums of squares for the non-marginal terms, in this case for the
main effects A and B, as well, you have strayed into tricky territory.  The
first sign that not all is as it seems is that the test now depends on what
treatment contrasts you have specified: contr.treatment or one of the
others.  If you DO NOT use contr.treatment but one where the column sums of
the contrast matrix are all zero, then you can get the "SAS Type III" sums
of squares by an inexplicable (to me) trick:

	dropterm(fm, . ~ ., test = "F")

but you can check that changing the contrasts back to "contr.treatment"
gives you different (and even more dud) results.

Rolf is right: there are conceivably cases where this is testing an
hypothesis of interest, just as occasionally it is interesting to test if a
regression line goes through the origin or if a quadratic regression has
zero slope at some point, but these are not the usual cases.  But it is
rare, and in 35 years of consulting I have never really encountered such an
occasion.  The often-quoted reason to use 'Type III' tests is "to test the
main effects when interactions ARE present", which if not further amplified
or explained, really is a nonsense.  My quarrel with SAS is that what they
routinely provide *encourages* misunderstandings like this and hence bad
inference.  Making users go to some length to get such results is, in my
view, no bad thing, (although the sequential AOV table that R and S-PLUS
routinely provides is in some respects not much better from this point of
view).

Moral: Decide what null hypothesis you would like to test, within what outer
hypothesis.  Fit both models and explicitly test one within the other.
There is then no need at all for any of this Type x palarver.  Attempts to
short-circuit the process with anova tables have to be viewed with some
caution, even scepticism, as the capacity for nonsense factor is very
operative.

Note that if you go no further than what drop1 or dropterm provides under
the default case, i.e. marginal terms only, then we have no quarrel.  These
are precisely the terms invariant with respect to contrast matrix.  However
beware of hidden non-marginal terms, such as the linear term in a quadratic
regression.

Bill Venables.
>  -----Original Message-----
> From: 	John Fox [mailto:jfox at mcmaster.ca] 
> Sent:	Friday, March 07, 2003 12:39 PM
> To:	Thomas Lumley; Josef Frank
> Cc:	r-help at stat.math.ethz.ch
> Subject:	Re: [R] type III Sum Sq in ANOVA table - Howto?
> 
> Dear Thomas et al.,
> 
> At 05:33 PM 3/6/2003 -0800, Thomas Lumley wrote:
> >On Fri, 7 Mar 2003, Josef Frank wrote:
> >
> > > Hello,
> > >
> > > as far as I see, R reports type I sums of squares. I'd like to get R
> to
> > > print out type III sums of squares.
> > >
> > > e.g. I have the following model:
> > > vardep~factor1*factor2
> > >
> > > to get the type III sum of squares for factor1 I've tried
> > > anova(lm(vardep~factor2+factor1:factor2),lm(vardep~factor1*factor2))
> > > but that didn't yield the desired result.
> > >
> > > Could anyone give me a hint how to proceed?
> > >
> >
> >Unfortunately the arguments about whether Type III sums of squares are
> >part of the axis of evil have drowned out a real issue.
> >
> >I would have expected the command to work, and in fact wrote a FAQ answer
> >saying this was the way to do it.  However, if factor1 is indeed a factor
> >its main effect is helpfully stuck back in the model by terms.formula.
> >
> >I think this is a bug, since it doesn't happen if factor1 isn't a factor,
> >and leaving aside any question about Type III SS it seems to make it
> >impossible to fit the model
> >    lm(vardep~factor2+factor1:factor2)
> >While this model isn't terribly often useful, it is sometimes.
> 
> The description of model formulas in Ch. 2 of Statistical Models in S 
> explains why ~factor2+factor1:factor2 is treated as it is.
> 
> Assuming that one really wants to test a "Type-III" hypothesis, the Anova 
> function in the car package will do it (and "Type-II" tests as well).
> 
> Regards,
>   John
> 
> -----------------------------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada L8S 4M4
> email: jfox at mcmaster.ca
> phone: 905-525-9140x23604
> web: www.socsci.mcmaster.ca/jfox
> -----------------------------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help



More information about the R-help mailing list