[Rd] anova() on three or more objects behaves inconsistently (PR#621)

ripley@stats.ox.ac.uk ripley@stats.ox.ac.uk
Tue, 1 Aug 2000 11:02:05 +0200 (MET DST)


anova() on three or more objects behaves inconsistently in R.

In R anovalist.lm does a sequential ANOVA using pairwise F tests,
ignoring all the other objects, so the larger of the two models
provides the denominator.

In S anova.lmlist uses the denominator from the largest model (smallest
residual df) in the set, as does anova.glmlist in both.

I suggest that R's anovalist.lm is wrong (that might be a matter of opinion
and can certainly be argued), but anovalist.lm and anova.glmlist should
agree.

?anova says in R

     When given a sequence of objects, `anova' tests the models against
     one another in the order specified.

and in S

       When two or more objects are used in the call,  a  similar
       table  is  produced  showing  the  effects of the pairwise
       differences between the  models,  considered  sequentially
       from first to last.

neither of which are clear enough to disambiguate this.


Example:

library(MASS)
data(quine)
quine.hi <- aov(log(Days + 2.5) ~ .^4, quine)
quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn)
quine.lo <- aov(log(Days+2.5) ~ 1, quine)
anova(quine.hi, quine.nxt, quine.lo)

quine.hi1 <- glm(log(Days + 2.5) ~ .^4, data=quine)
quine.nxt1 <- update(quine.hi1, . ~ . - Eth:Sex:Age:Lrn)
quine.lo1 <- glm(log(Days+2.5) ~ 1, data=quine)
anova(quine.hi1, quine.nxt1, quine.lo1, test="F")

S:

  Resid. Df      RSS             Test  Df Sum of Sq  F Value     Pr(F) 
1       118  63.3104                                                  
2       120  64.0990 -Eth:Sex:Age:Lrn  -2  -0.78865 0.734957 0.4817083
3       145 106.7871                  -25 -42.68810 3.182542 0.0000131

...

  Resid. Df Resid. Dev             Test  Df  Deviance  F Value     Pr(F) 
1       118    63.3104                                                  
2       120    64.0990 -Eth:Sex:Age:Lrn  -2  -0.78865 0.734957 0.4817083
3       145   106.7871                  -25 -42.68810 3.182542 0.0000131

R:
Analysis of Variance Table

Model 1: log(Days + 2.5) ~ Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + 
Model 2: log(Days + 2.5) ~ Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + 
Model 3: log(Days + 2.5) ~ 1
  Res.Df Res.Sum Sq  Df  Sum Sq F value    Pr(>F)
1    118     63.310                              
2    120     64.099  -2  -0.789  0.7350    0.4817
3    145    106.787 -25 -42.688  3.1967 1.148e-05

[The F value and p value are different here.]

...

  Resid. Df Resid. Dev  Df Deviance       F    Pr(>F)
1       118     63.310                               
2       120     64.099  -2   -0.789   0.735    0.4817
3       145    106.787 -25  -42.688   3.183 1.306e-05

As of the current patched version of R anova.glmlist is consistent when
reversing the order of the objects.


Also, the labelling is inconsistent, and for both is not very helpful:

> anova(quine.hi1, quine.nxt1, quine.lo1)
Analysis of Deviance Table 

Response: log(Days + 2.5)
                                                                         Resid. Df
c(\"Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + Eth:Lrn + Sex:Age + \",        118
c(\"Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + Eth:Lrn + Sex:Age + \",        120
1                                                                              145
                                                                         Resid. Dev
c(\"Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + Eth:Lrn + Sex:Age + \",      63.310
c(\"Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + Eth:Lrn + Sex:Age + \",      64.099
1                                                                           106.787
                                                                          Df
c(\"Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + Eth:Lrn + Sex:Age + \",     
c(\"Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + Eth:Lrn + Sex:Age + \",   -2
1                                                                        -25
                                                                         Deviance
c(\"Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + Eth:Lrn + Sex:Age + \",          
c(\"Eth + Sex + Age + Lrn + Eth:Sex + Eth:Age + Eth:Lrn + Sex:Age + \",    -0.789
1                                                                         -42.68

which gives problems later as the row names are not unique.



--please do not edit the information below--

Version:
 platform = sparc-sun-solaris2.6
 arch = sparc
 os = solaris2.6
 system = sparc, solaris2.6
 status = Patched
 major = 1
 minor = 1.0
 year = 2000
 month = July
 day = 28
 language = R

Search Path:
 .GlobalEnv, package:MASS, Autoloads, package:base

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._