[R] Three-way ANOVA?

Rolf Turner rolf at math.unb.ca
Fri Jul 30 22:46:19 CEST 2004

> I'm a biologist, so please forgive me

	Biologists are hard to forgive! :-)

>                                        if my question sounds absurd!

	The only absurd question is the one that is not asked.

> I have 3 parameters x1, x2, x3 and a response variable y.The
> sample size is 75. I tried to do the following:
> mylm<-lm(y~ x1 + x2 + x3, data="mydata")

	You really do need to clarify your thinking.  And your
	terminology.  Parameters are different from ***predictors***!

	So x1, x2, and x3 are predictor variables --- you say.
	Now just what ***kind*** of predictor variables are they?

> but i can only get stats from anova for the first 2 variables. The
> third comes up as NA. The degrees of freedom for the third variable
> are 0.

	If you really are doing ``3 way ANOVA'' each of x1, x2,
	and x3 should be a ***factor***.

	If that really is the case, then your experimental design
	is so ridiculously unbalanced that the x3 factor is
	redundant.  I.e. if you know the levels of x1 and x2 then
	you know what cell of the model you're in and you don't
	need x3 to tell you.

	E.g. suppose x1 has levels 1 and 2, x2 has levels ``a'' and
	``b'', and x3 has levels mung, gorp, clyde, and melvin ....
	but, when x1 = 1 and x2 = a, then x3 = mung,
	     when x1 = 1 and x2 = b, then x3 = gorp,
	     when x1 = 2 and x2 = a, then x3 = clyde,
	     when x1 = 2 and x2 = b, then x3 = melvin. 

	In such a situation x3 would add no further information
	and if you said lm(y~x1+x2+x3) the software would tell
	you to go stick your head in a pig, except it's too polite.

	However I suspect that this is not actually the case.
	I am guessing that you have a one-way ANOVA on a factor
	with 3 levels, and that x1, x2, and x3 are indicator
	variables such that ``xi'' is equal to 1 when the level
	of the factor is the i-th level, and 0 elsewise.

	So you are really doing a regression on (numeric variables)
	x1, x2, and x3 rather than doing an ANOVA.  And since x1 + x2
	+ x3 is identically equal to 1 (``= the constant term in the
	regression'') x3 is again redundant and the software
	essentially makes no use of it.

	This is just my guess; if that's not the case, you'll just
	have to tell us more about what ***is*** the case --- be much
	more explicit about the problem that you are trying to solve;
	tell us the practical background --- if you are going to get
	any useful advice.

> Is there something else i could do to fit a model that contains all
> these three variables?

	In a word no.  If the software tells you that x3 is
	irrelevant, it is irrelevant.  Nothing you can do will change
	that.  We are talking here about statistics, not religeon.

	You simply have to fit an ***appropriate*** model to your
	data, and then understand the output from the software that
	effects the fit.

	In order to choose an appropriate model you need to
	understand the models from which you make your choice; you
	need to know what you are actually ***doing***.  What do you
	really want to know?  How will statistical procedures bring
	you closer to knowing that?  Don't treat statistics as a
	magic wand whose operation is impenetrable to your
	understanding.  It isn't.  Neither a magic wand nor

	Finally let me say:  OK, you're a biologist.  There's no
	law against that.  But you are ***trying to do statistics***.
	It is ridiculous to try to do statistics without having
	a clue what it all means.  So, step 1:  Learn some statistics.
	Not all of statistics, but enough to ***understand*** the
	techniques that you seek to apply.


						Rolf Turner
						rolf at math.unb.ca

More information about the R-help mailing list