# [R] Three-way ANOVA?

Rolf Turner rolf at math.unb.ca
Fri Jul 30 22:46:19 CEST 2004

```> I'm a biologist, so please forgive me

Biologists are hard to forgive! :-)

>                                        if my question sounds absurd!

The only absurd question is the one that is not asked.

> I have 3 parameters x1, x2, x3 and a response variable y.The
> sample size is 75. I tried to do the following:
> mylm<-lm(y~ x1 + x2 + x3, data="mydata")

You really do need to clarify your thinking.  And your
terminology.  Parameters are different from ***predictors***!

So x1, x2, and x3 are predictor variables --- you say.
Now just what ***kind*** of predictor variables are they?

> but i can only get stats from anova for the first 2 variables. The
> third comes up as NA. The degrees of freedom for the third variable
> are 0.

If you really are doing ``3 way ANOVA'' each of x1, x2,
and x3 should be a ***factor***.

If that really is the case, then your experimental design
is so ridiculously unbalanced that the x3 factor is
redundant.  I.e. if you know the levels of x1 and x2 then
you know what cell of the model you're in and you don't
need x3 to tell you.

E.g. suppose x1 has levels 1 and 2, x2 has levels ``a'' and
``b'', and x3 has levels mung, gorp, clyde, and melvin ....
but, when x1 = 1 and x2 = a, then x3 = mung,
when x1 = 1 and x2 = b, then x3 = gorp,
when x1 = 2 and x2 = a, then x3 = clyde,
when x1 = 2 and x2 = b, then x3 = melvin.

In such a situation x3 would add no further information
and if you said lm(y~x1+x2+x3) the software would tell
you to go stick your head in a pig, except it's too polite.

However I suspect that this is not actually the case.
I am guessing that you have a one-way ANOVA on a factor
with 3 levels, and that x1, x2, and x3 are indicator
variables such that ``xi'' is equal to 1 when the level
of the factor is the i-th level, and 0 elsewise.

So you are really doing a regression on (numeric variables)
x1, x2, and x3 rather than doing an ANOVA.  And since x1 + x2
+ x3 is identically equal to 1 (``= the constant term in the
regression'') x3 is again redundant and the software
essentially makes no use of it.

This is just my guess; if that's not the case, you'll just
have to tell us more about what ***is*** the case --- be much
more explicit about the problem that you are trying to solve;
tell us the practical background --- if you are going to get

> Is there something else i could do to fit a model that contains all
> these three variables?

In a word no.  If the software tells you that x3 is
irrelevant, it is irrelevant.  Nothing you can do will change
that.  We are talking here about statistics, not religeon.

You simply have to fit an ***appropriate*** model to your
data, and then understand the output from the software that
effects the fit.

In order to choose an appropriate model you need to
understand the models from which you make your choice; you
need to know what you are actually ***doing***.  What do you
really want to know?  How will statistical procedures bring
you closer to knowing that?  Don't treat statistics as a
magic wand whose operation is impenetrable to your
understanding.  It isn't.  Neither a magic wand nor
impenetrable.

Finally let me say:  OK, you're a biologist.  There's no
law against that.  But you are ***trying to do statistics***.
It is ridiculous to try to do statistics without having
a clue what it all means.  So, step 1:  Learn some statistics.
Not all of statistics, but enough to ***understand*** the
techniques that you seek to apply.

cheers,

Rolf Turner
rolf at math.unb.ca

```