[R] Need help understanding output from aov and from anova

Suman Sundaresh sumansun at gmail.com
Thu Jun 4 18:08:07 CEST 2009


Hi Steve,

Thanks for your response and also for the useful information about the
latest version warning.

On the earlier version that I was using (2.6.2 Win), I was expecting
at least an error or warning in response to submitting an obviously
corner case condition that should result in "NaN", but it does not
yield any. The t-test does so. Actually, when the groups have two
samples each (and all values are identical), the P-value results in an
NaN which is what one would expect.

It is great to know that it has been corrected at least through a
warning in the latest version.

R version 2.6.2 (2008-02-08)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

<snip>

#No warning/error provided when there are 2 samples in 1 group, and 1
in the other
> vtot=c(200,200,200)
> fac=as.factor(c(1,1,2))
> anova(lm(vtot~fac))
Analysis of Variance Table

Response: vtot
          Df     Sum Sq    Mean Sq F value Pr(>F)
fac        1 1.4722e-27 1.4722e-27  0.3333 0.6667
Residuals  1 4.4166e-27 4.4166e-27
>

#### T-test works as expected with an error
> t.test(vtot~fac,var.equal=TRUE)
Error in t.test.default(x = c(200, 200), y = 200, var.equal = TRUE) :
  data are essentially constant


# P-value is NaN when we have at least two samples in each group
> vtot=c(200,200,200,200)
> fac=as.factor(c(1,1,2,2))
> anova(lm(vtot~fac))
Analysis of Variance Table

Response: vtot
          Df Sum Sq Mean Sq F value Pr(>F)
fac        1      0       0
Residuals  2      0       0

> anova(lm(vtot~fac))[1,5]
[1] NaN
>

Best,
Suman.


On Wed, Jun 3, 2009 at 9:24 PM, Steven McKinney <smckinney at bccrc.ca> wrote:
> Hi Suman,
>
> What version of R are you running?
>
> In R 2.9.0 running your first example yields a warning
>
>  Warning message:
>  In anova.lm(lm(vtot ~ fac)) :
>   ANOVA F-tests on an essentially perfect fit are unreliable
>
> so some adept R developer has taken the time to figure
> out how to warn you about such a problem.
>
> Perhaps someone will add this to aov() at some point as well.
>
> The only variability in this problem is that introduced
> by machine precision rounding errors.
>
> The exercise of submitting data with no variability to
> a program designed to assess variability cannot be expected
> to produce meaningful output, so there's nothing to
> understand except the issue of machine precision.
> Machine roundoff error is an important topic, so I'd
> recommend learning about that issue, which will do most
> to help understand these examples.
>
> Best
>
> SteveM
>
>
> R version 2.9.0 (2009-04-17)
> Copyright (C) 2009 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>  Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> vtot=c(7.29917, 7.29917, 7.29917)  #identical values
>> fac=as.factor(c(1,1,2))   #group 1 has first two elements, group 2 has
>> anova(lm(vtot~fac))
> Analysis of Variance Table
>
> Response: vtot
>          Df     Sum Sq    Mean Sq F value Pr(>F)
> fac        1 1.6818e-30 1.6818e-30  0.3333 0.6667
> Residuals  1 5.0455e-30 5.0455e-30
> Warning message:
> In anova.lm(lm(vtot ~ fac)) :
>  ANOVA F-tests on an essentially perfect fit are unreliable
>>
>> summary(aov(vtot~fac))
>            Df     Sum Sq    Mean Sq F value Pr(>F)
> fac          1 1.6818e-30 1.6818e-30  0.3333 0.6667
> Residuals    1 5.0455e-30 5.0455e-30
>>
>> fac=as.factor(c(1,2,2))
>> anova(lm(vtot~fac))
> Analysis of Variance Table
>
> Response: vtot
>          Df     Sum Sq    Mean Sq    F value    Pr(>F)
> fac        1 6.7274e-30 6.7274e-30 1.3340e+32 < 2.2e-16 ***
> Residuals  1  5.043e-62  5.043e-62
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> Warning message:
> In anova.lm(lm(vtot ~ fac)) :
>  ANOVA F-tests on an essentially perfect fit are unreliable
>>
>
>
>
>
>
> Steven McKinney, Ph.D.
>
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
>
> email: smckinney at bccrc.ca
> tel: 604-675-8000 x7561
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
>
> Canada
>
>
>
>
>
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Suman Sundaresh
>> Sent: Wednesday, June 03, 2009 3:55 PM
>> To: r-help at r-project.org
>> Subject: [R] Need help understanding output from aov and from anova
>>
>> Hi all,
>>
>> I noticed something strange when I ran aov and anova.
>>
>> vtot=c(7.29917, 7.29917, 7.29917)  #identical values
>> fac=as.factor(c(1,1,2))   #group 1 has first two elements, group 2 has
>> the 3rd element
>>
>> When I run:
>> > anova(lm(vtot~fac))
>> Analysis of Variance Table
>>
>> Response: vtot
>>           Df     Sum Sq    Mean Sq F value Pr(>F)
>> fac        1 1.6818e-30 1.6818e-30  0.3333 0.6667
>> Residuals  1 5.0455e-30 5.0455e-30
>>
>>
>> I get a p-value of 0.667. This seems strange to me. I would have
>> expected the p-value to be NaN.
>>
>> Again, when I run:
>> > summary(aov(vtot~fac))
>>             Df     Sum Sq    Mean Sq F value Pr(>F)
>> fac          1 1.6818e-30 1.6818e-30  0.3333 0.6667
>> Residuals    1 5.0455e-30 5.0455e-30
>>
>> Again same p-value.
>>
>>
>> Now, if I set fac to c(1,2,2) which is essentially just switching the
>> groups.
>> fac=as.factor(c(1,2,2))
>>
>> And run,
>> > anova(lm(vtot~fac))
>> Analysis of Variance Table
>>
>> Response: vtot
>>           Df     Sum Sq    Mean Sq    F value    Pr(>F)
>> fac        1 6.7274e-30 6.7274e-30 1.3340e+32 < 2.2e-16 ***
>> Residuals  1  5.043e-62  5.043e-62
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>>
>> The p-value is really significant which again looks very strange.
>>
>> Please could someone shed some light on what I may be missing here?
>>
>> Thanks very much.
>> Suman.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list