[R] Basic question about three factor Anova

Tue May 31 01:26:48 CEST 2011

This is really a question about the help file for gl.

The arguments are

gl(n, k, length = n*k, labels = 1:n, ordered = FALSE)

'n' is the number of factor levels.  That seems to be easy enough

'k' is called the "number of replications".  This is perhaps not the best way to express what it is.  k is the number of times each of the n levels is to be repeated before starting again.  In your example the 'a' levels are repeated 3 times (to cover 'b'), the 'b' levels are repeated once since you read in the values b1 b2 b3 b1 b2 ... and the levels of 'c' are repeated 60 times each since the top 60 values are all c1 and the bottom 60 values are all c2.

'length' is the overall length of the factor you are generating.  By default is is just n*k, but in this case it has to be 4 (A levels) x 3 (B levels) x 2 (C levels) x 5 (reps in each A:B:C subgroup).

The other two arguments are clear enough.

Bill Venables.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Bogdan Lataianu
Sent: Tuesday, 31 May 2011 8:05 AM
To: r-help at r-project.org
Subject: [R] Basic question about three factor Anova

 Read the data using scan():
#
#          a1               a2               a3               a4
#     -------------    -------------    -------------    -------------
#     b1   b2   b3     b1   b2   b3     b1   b2   b3     b1   b2   b3
#     ---  ---  ---    ---  ---  ---    ---  ---  ---    ---  ---  ---
#
# c1:
#     4.1  4.6  3.7    4.9  5.2  4.7    5.0  6.1  5.5    3.9  4.4  3.7
#     4.3  4.9  3.9    4.6  5.6  4.7    5.4  6.2  5.9    3.3  4.3  3.9
#     4.5  4.2  4.1    5.3  5.8  5.0    5.7  6.5  5.6    3.4  4.7  4.0
#     3.8  4.5  4.5    5.0  5.4  4.5    5.3  5.7  5.0    3.7  4.1  4.4
#     4.3  4.8  3.9    4.6  5.5  4.7    5.4  6.1  5.9    3.3  4.2  3.9
#
# c2:
#     4.8  5.6  5.0    4.9  5.9  5.0    6.0  6.0  6.1    4.1  4.9
4.3
#     4.5  5.8  5.2    5.5  5.3  5.4    5.7  6.3  5.3    3.9  4.7  4.1
#     5.0  5.4  4.6    5.5  5.5  4.7    5.5  5.7  5.5    4.3  4.9  3.8
#     4.6  6.1  4.9    5.3  5.7  5.1    5.7  5.9  5.8    4.0  5.3  4.7
#     5.0  5.4  4.7    5.5  5.5  4.9    5.5  5.7  5.6    4.3  4.3  3.8
#
# NOTE: Cut and paste the numbers without the leading # or labels
#

> Y <- scan()
> A <- gl(4,3, 4*3*2*5, labels=c("a1","a2","a3","a4"));
> B <- gl(3,1, 4*3*2*5, labels=c("b1","b2","b3"));
> C <- gl(2,60, 4*3*2*5, labels=c("c1","c2"));
> anova(lm(Y~A*B*C))   # all effects and interactions

In the above example, why the number of replications for A is 3, for B
is 1 and for C is 60?
And why 4*3*2*5? Is the 5 because there are 5 lines in each 4*3*2
group?
What is the logic behind this?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.