[R] Anova - interpretation of the interaction term

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Sat Apr 23 15:33:00 CEST 2005


On 23-Apr-05 Bill.Venables at csiro.au wrote:
>: -----Original Message-----
>: From: r-help-bounces at stat.math.ethz.ch 
>: [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
>: michael watson (IAH-C)
>: [...]
>: I have a highly significant interaction term. In the context
>: of the experiment, this makes sense. I can visualise the data 
>: graphically, and sure enough I can see that both factors have
>: different effects on the data DEPENDING on what the value of
>: the other factor is.  
>: 
>: I explain this all to my colleague - and she asks "but which
>: ones are different?" This is best illustrated with an example.
>: We have either infected | uninfected, and vaccinated | unvaccinated
>: (the two factors).
>: We're measuring expression of a gene.  Graphically, in the
>: infected group, vaccination makes expression go up. In the
>: uninfected group, vaccination makes expression go down. In
>: both the vaccinated and unvaccinated groups, infection makes
>: expression go down, but it goes down further in unvaccinated
>: than it does in vaccinated.
>: 
>: So from a statistical point of view, I can see exactly why
>: the interaction term is significant, but what my colleage
>: wants to know is that WITHIN the vaccinated group, does
>: infection decrease expression significantly? And within
>: the unvaccinated group, does infection decrease expression
>: significantly?  Etc etc etc  Can I get this information from
>: the output of the ANOVA, or do I carry out a separate
>: test on e.g. just the vaccinated group? (seems a cop out to me)
> 
> No, you can't get this kind of specific information out of the anova
> table and yes, anova tables *are* a bit of a cop out.  (I sometimes 
> think they should only be allowed between consenting adults in
> private.)

I think the "cop out" Michael Watson was referring to means
going back to basics and doing a separate analysis on each group
(though no doubt using the Res SS from the AoV table).

Not that I disagree with your comment: I sometimes think that
anova tables are often passed round between adults in order to
induce consent which might otherwise have been withheld.

> What you are asking for is a non-standard, but perfectly
> reasonable partition of the degrees of freedom between the
> classes of a single factor with four levels got by pairing
> up the levels of vaccination and innoculation. Of course you
> can get this information, but you have to do a bit of work
> for it.  

It seems to me that this is a wrapper for "separate analysis
on each group"!

> Before I give the example which I don't expect too many people
> to read entirely, let me issue a little challenge, namely to
> write tools to automate a generalized version of the procedure
> below.

[technical setup snipped]

>> contrasts(dat$vac_inf) <- ginv(m)
>> gm <- aov(y ~ vac_inf, dat)
>> summary(gm)
>             Df  Sum Sq Mean Sq F value  Pr(>F)
> vac_inf      3 12.1294  4.0431   7.348 0.04190
> Residuals    4  2.2009  0.5502
> 
> This doesn't tell us too much other than there are differences,
> probably.  Now to specify the partition:
>                 
>> summary(gm, 
>       split = list(vac_inf = list("- vs +|N" = 1, 
>                                   "- vs +|Y" = 2)))
>                     Df  Sum Sq Mean Sq F value  Pr(>F)
> vac_inf              3 12.1294  4.0431  7.3480 0.04190
>   vac_inf: - vs +|N  1  7.9928  7.9928 14.5262 0.01892
>   vac_inf: - vs +|Y  1  3.7863  3.7863  6.8813 0.05860
> Residuals            4  2.2009  0.5502

Wow, Bill! Dazzling. This is like watching a rabbit hop
into a hat, and fly out as a dove. I must study this syntax.
But where can I find out about the "split" argument to "summary"?
I've found the *function* split, whose effect is similar, but
I've wandered around the "summary", "summary.lm" etc. forest
for a while without locating the *argument*.

My naive ("cop-out") approach would have been on the lines
of (without setting up the contrast matrix):

  summary(aov(y~vac*inf,data=dat))
              Df  Sum Sq Mean Sq F value  Pr(>F)  
  vac          1  0.3502  0.3502  0.6364 0.46968  
  inf          1 11.3908 11.3908 20.7017 0.01042 *
  vac:inf      1  0.3884  0.3884  0.7058 0.44812  
  Residuals    4  2.2009  0.5502                  

so we get the 2.2009 on 4 df SS for redisuals with mean SS 0.5502.

Then I would do:

  mNp<-mean(y[(vac=="N")&(inf=="+")])
  mNm<-mean(y[(vac=="N")&(inf=="-")])
  mYp<-mean(y[(vac=="Y")&(inf=="+")])
  mYm<-mean(y[(vac=="Y")&(inf=="-")])

  c(     mYp,       mYm,       mNp,      mNm       )
  ##[1]  2.4990492  0.5532018  2.5212655 -0.3058972

  c(    mYp-mYm,   mNp-mNm )
  ##[1] 1.945847   2.827163


after which:

  1-pt(((mYp-mYm)/sqrt(0.5502)),4)
  ##[1] 0.02929801

  1-pt(((mNp-mNm)/sqrt(0.5502)),4)
  ##[1] 0.009458266

give you 1-sided t-tests, and

  1-pf(((mYp-mYm)/sqrt(0.5502))^2,1,4)
  ##[1] 0.05859602

  1-pf(((mNp-mNm)/sqrt(0.5502))^2,1,4)
  ##[1] 0.01891653

give you F-tests (equivalent to 2-sided t-tests) which agree
with Bill's results above.

And, in this case, presenting the results as mean differences
shows that the effect of infection appears to differ between
vaccinated and unvaccinated groups; but a simple test shows
this not to be significant:

  1-pf( (sqrt(1/2)*((mYp-mYm)-(mNp-mNm))/sqrt(0.5502))^2, 1,4)
  ##[1] 0.4481097

As I said above, this would be my naive approach to this
particular case (and to any like it), and I would expect
to be able to explain all this in simple terms to a colleague
who was asking the sort of questions you have reported.
Or is it the case that offering an anova table is needed,
in order to evoke consent to the results by virtue of the
familiar format?

> As expected, infection changes the mean for both vaccinated and
> unvaccinated, as we arranged when we generated the data.

The challenge to generalise is interesting. However, as implied
above, I'm already impressed by Bill's footwork in R for this
simple case, and it might be some time before I'm fluent
enough in that sort of thing to deal with more complicated
cases, let alone the general one.

For users like myself, a syntax whose terms are closer to
ordinary language would be more approachable. Something
on the lines of

  summary(aov(y ~ vac*inf), by=inf, within=vac)

which would present a table similar to Bill's above (by inf
within the different levels of vac).

Intriguing. The challenge is tempting ... !

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Apr-05                                       Time: 14:33:00
------------------------------ XFMail ------------------------------




More information about the R-help mailing list