[R] Type III sums of squares.
Martin Henry H. Stevens
hstevens at muohio.edu
Wed Oct 17 15:35:43 CEST 2001
>From a disciple:
Allow me to make this discussion a little more concrete for my mortal mind.
Let Factor A = grams of nitrogen fertilizer (levels i=1,2)
Let Factor B = watering regime (levels j=1,2)
Let response Y = Yield of soybeans (g / m^2)
Suppose we measure yield in different treatments and find means of:
B1 50 100
B2 60 150
Suppose further that we have sufficiently small error to detect differences
among all means and (of course) "significant" main effects and significant
interaction. I would argue strongly that adding nitrogen, regardless of
other factors, increases yield. I would also argue that adding water,
regardless, of other factors, increaases yield. I would also conclude that
adding both together increases uyield more than you might expect based on
adding each factor separately. In a messy ecological setting, we frequently
don't know such basic information.
A more important example arises when factor B is a random effect (spatially
arrayed blocks outside the control of the experimenter?). In such a case, if
levels of B provide a representative sample of the relevent universe, then a
significant main effect demonstrates an overall trend REGARDLESS of what is
going on within each block. Significant interaction may provide interesting
details in the system, greater insight etc., but in the end, we might be
interested primarily in the overall trend across a landscape.
Hstevens at muohio.edu
----- Original Message -----
From: "Rolf Turner" <rolf at maths.uwa.edu.au>
To: <r-help at stat.math.ethz.ch>
Sent: Wednesday, October 17, 2001 12:16 AM
Subject: [R] Type III sums of squares.
> Peter Dalgaard writes (in response to a question about 2-way ANOVA
> with imbalance):
> > ... There are various
> > boneheaded ways in which people try to use to assign some kind of
> > SumSq to main effects in the presence of interaction, and they are all
> > wrong - although maybe not very wrong if the unbalance is slight.
> People keep saying this --- very vehemently --- and it is NOT TRUE.
> Point 1 --- imbalance is really irrelevant here, a fact which
> is usually (always?) overlooked. If the design is balanced,
> all ``types'' of sums of squares are the same. The sequential
> sums of squares which R will happily produce might well contain
> ``significant'' values for SSA and/or SSB ***and*** a significant
> value for the interaction sum of squares, SSAB.
> Point 2 --- What does such ``significance'' ***mean***? It is not
> correct to say that it means nothing at all. The significance
> of say, SSA, reports on the result of the test of a hypothesis.
> This hypothesis is a ***meaningful*** hypothesis. It may well not be
> an important hypothesis, or a particularly interesting hypothesis,
> or a hypothesis that the experimenter actually cares about.
> It is substantially different from the hypothesis which is tested
> by SSA when there is no interaction. (Different, but related.)
> Bill Venables fulminates that consideration of such a hypothesis is
> contrary to the fundamental philosophy of statistcial modelling, and
> thereby an abomination in the sight of God, and probably Politically
> Incorrect to boot. This may well be so. Nonetheless it ***is***
> a well-defined and meaningful hypothesis.
> Rather than dismissing the testing of such a hypothesis as being
> ``bone-headed'', the guru should point out to the desciple
> (a) just what hypothesis is being tested,
> (b) that this hypothesis packs a substantially different
> load of freight than that which is tested when there is
> no interaction, and
> (c) that the desciple should carefully search his or her
> soul as to whether the hypothesis which is being tested
> is of any actual interest.
> This would go much further toward bringing the desciple to true
> Point 3 --- what hypothesis is being tested by SSA?
> Let factor A correspond to index i, and B to index j.
> Let the cell means be mu_ij. (In the overparameterized
> notation, mu_ij = mu + alpha_i + beta_j + gamma_ij.)
> The hypothesis being tested is
> H_0: mu_1.-bar = mu_2.-bar = ... = mu_a.-bar
> where factor A has a levels, and ``mu_i.-bar'' means
> the average (arithmetic mean) of mu_i1, mu_i2, ..., mu_ib.
> (Note --- factor B has b levels.)
> I.e. the hypothesis is that there is no difference, on average,
> between the levels of A, the average being taken over the levels
> of B.
> Now taking such an average may not be a sensible thing to do,
> but it is perfectly well-defined, and thus a ***meaningful***
> hypothesis is being tested. (The meaning of which the hypothesis
> is full might not be very exciting, but that is more of a practical
> than a statistical issue.)
> Note that the hypothesis being tested, while possibly of dubious
> import, is perfectly comprehensible to the human mind.
> (Remark: In real life, if we were really interested in averaging
> over the levels of B at all, we would probably want a ***weighted***
> average, with the weights corresponding to the preponderance of
> the levels of B in the population.)
> Note that if there is no interaction (if the gamma_ij are all zero)
> then the hypothesis being tested is that for each fixed j, the mu_ij
> are all ***identical*** (say mu_ij = tau_j) and hence the averages
> over j are equal (mu_i.-bar = tau.-bar, independent of i.)
> This is all easier to think about graphically. For each j, plot the
> mu_ij against the index i, giving a ``profile''. ``No interaction''
> means that all profiles are parallel. No interaction and no A
> effect means that all profiles are horizontal.
> If the profiles are parallel, then all profiles will be horizontal
> if and only if their mean is horizontal.
> However if the profiles are ***not*** parallel (i.e. if there is
> interaction) their means may be horizontal anyhow.
> Let me repeat: This horizontallity may not be of much interest if
> the profiles are not parallel, but it is a perfectly well-defined
> concept, and testing for it makes perfect sense in the abstract.
> Point 4 --- on the (remote?) chance that we really are interested in
> the above horizontallity, and if the design is in fact NOT BALANCED,
> then the much maligned type III sums of squares are ***definitely***
> called for. Type III sums of squares will test the null hypothesis
> stated in Point 3, irrespective of balance. Sequential sums of
> squares will test another, different, and totally bizarre hypothesis.
> (Again a perfectly ``meaningfull'' hypothesis, but one such that the
> meaning is really too convoluted to admit any sort of comprehension
> by the human mind. Moreover this hypothesis is dependent on the
> design structure, rendering it even more unlikely to be of any
> interest, even if one could understand what it it is saying.)
> Rolf Turner
> rolf at maths.unb.ca
> r-help mailing list -- Read
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help