[R] Interpretation of call to aov()

Sat Aug 5 14:15:58 CEST 2006

Hi all,

I've been reading about aov() at
http://www.psych.upenn.edu/~baron/rpsych/rpsych.html and
http://davidmlane.com/hyperstat/intro_ANOVA.html and
I try to use this test in experiments with my simulator.

What I would like Anova to tell me is whether the differences I see
when plotting the means of performance per method are significant.
And also, whether this is dependent on the problem size (bigger is
more complex).
I would be very grateful if there's somebody more mathematically skilled
on this list who could tell me whether I'm drawing correct conclusions.

> data
    performance  method problem
1   146780.0000      -f     960
2     4654.0000      -f     160
3    45840.0000      -f     320
4    54750.0000      -f     320
5    91750.0000      -f     480
6     7452.0000      -f     160
7     8866.0000      -f     160
8     8513.0000      -f     160
9   139520.0000      -f     960
10   85380.0000      -f     480
<snip>

> str(data)
`data.frame':   419 obs. of  3 variables:
 $ performance: num  146780   4654  45840  54750  91750 ...
 $ method     : Factor w/ 7 levels "-f","-f -q","-h0 -r0",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ problem    : int  960 160 320 320 480 160 160 160 960 480 ...

>   summary(aov(performance ~ method * problem, data=data))
                Df     Sum Sq    Mean Sq F value    Pr(>F)
method           6 3.3185e+11 5.5308e+10  416.91 < 2.2e-16 ***
problem          1 5.7141e+11 5.7141e+11 4307.26 < 2.2e-16 ***
method:problem   6 9.8891e+10 1.6482e+10  124.24 < 2.2e-16 ***
Residuals      405 5.3728e+10 1.3266e+08
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I interpret this data as follows:

-1- The performance depends on the chosen method.
If I compute the overall performance means for each method, this will
give me numbers that are significantly different. This means, the method
with the greatest mean is significantly better than at least some other
methods (and not worse than any other method).

-2- The performance depends on the problem complexity.
This is not so interesting. In my setting it is trivial that performance
is worse for more complex problems.

-3- There is interaction between method and complexity, in other words,
when trying to order the methods from bad to good, one cannot simply do
this without taking the problem complexity into account. (for simple
problems method A might be the best, for complex problems, another method
might be the better).

I have not used Error() in my call to aov().
I've seen this one being used: Error(subj/(shape * color)
But I do not have subjects. Or in fact, I believe I have only 1, which is
my simulator. Am I correct about that? Or should I use something like
Error(method * problem) ?

Thanks in advance,
JeeBee.