[R] Understanding lm-based analysis of fractional factorial experiments

Wed Mar 6 11:46:16 CET 2013

All,

I have just returned to R after a decade of absence, and it is good to 
see that R has become such a great success! I'm trying to bring Design 
of Experiments into some aspects of software performance evaluation, and 
to teach myself that, I picked up "Experiments: Planning, Analysis and 
Optimization" by Wu and Hamada. I try to reproduce an analysis in the 
book using lm, but have to conclude I don't understand what lm does in 
this context, even though I end up at the desired result. I'm currently 
using R 2.15.2 on a recent Fedora system, but I get the same result on 
Debian Wheezy and Debian Squeeze. I think the discussion below can be 
followed without having the book at hand though.

I'm working with tables 5.2 and 5.5 in the above mentioned book. Table 
5.2 contains data from the "Leaf spring experiment". The dataset is also 
in this zip file:

ftp://ftp.wiley.com/public/sci_tech_med/experiments-planning/data%20sets.zip

I've learned from the book that the effects can be found using a linear 
model and double the coefficients. So, I do
 > leaf <- 
read.table("/ifi/bifrost/a03/kjekje/fag/experimental-planning/book-datasets/LeafSpring 
table 5.2.dat", col.names=c("B", "C", "D", "E", "Q", paste("r", 1:3, 
sep=""), "yavg", "ssq", "lnssq"))
 > leaf.lm <- lm(yavg ~ B * C * D * E * Q, data=leaf)
 > leaf.lm

Call:
lm(formula = yavg ~ B * C * D * E * Q, data = leaf)

Coefficients:
    (Intercept)              B+              C+              D+ 
      E+
        7.54000         0.07003         0.32333        -0.09668 
0.07668
             Q+           B+:C+           B+:D+           C+:D+ 
   B+:E+
       -0.33670         0.01335         0.11995         0.02335 
      NA
          C+:E+           D+:E+           B+:Q+           C+:Q+ 
   D+:Q+
             NA              NA         0.22915        -0.25745 
0.28255
          E+:Q+        B+:C+:D+        B+:C+:E+        B+:D+:E+ 
C+:D+:E+
        0.05415              NA              NA              NA 
      NA
       B+:C+:Q+        B+:D+:Q+        C+:D+:Q+        B+:E+:Q+ 
C+:E+:Q+
        0.04160        -0.16160        -0.18840              NA 
      NA
       D+:E+:Q+     B+:C+:D+:E+     B+:C+:D+:Q+     B+:C+:E+:Q+ 
B+:D+:E+:Q+
             NA              NA              NA              NA 
      NA
    C+:D+:E+:Q+  B+:C+:D+:E+:Q+
             NA              NA

(seems there is little I can do about the line breaks here, sorry)

However, the book (table 5.5), has 0.221 for the main effect of B and 
0.176, and the above is neither this, nor half of it. Now, I can 
reproduce what's in the book with

 > lm(yavg ~ B, data=leaf)

Call:
lm(formula = yavg ~ B, data = leaf)

Coefficients:
(Intercept)           B+
      7.5254       0.2213

 > lm(yavg ~ C, data=leaf)

Call:
lm(formula = yavg ~ C, data = leaf)

Coefficients:
(Intercept)           C+
      7.5479       0.1763

Assuming lm does in fact double the coefficient in this case, but here 
the intercept varies, which doesn't seem correct, nor can I as trivially 
find the interactions the same way.

Now, I try the effects() function, and get familiar numbers:
 > effects(leaf.lm)
(Intercept)          B+          C+          D+          E+          Q+
   -30.54415    -0.44250     0.35250    -0.05750    -0.20750    -0.51920
       B+:C+       B+:D+       C+:D+       B+:Q+       C+:Q+       D+:Q+
    -0.03415    -0.03915     0.07085    -0.16915     0.33085    -0.10755
       E+:Q+    B+:C+:Q+    B+:D+:Q+    C+:D+:Q+
     0.05415    -0.02080     0.08080    -0.09420

and indeed, I have verified that effects(leaf.lm)/2 gives me the 
expected result.

So, I have found the correct answer, but I don't understand why. I have 
read the documentation for effects() as well as looked through the 
relevant chapter in "Statistical Models in S", but from that all I got 
was that I suppose there is a hint in the phrase "the effects are the 
uncorrelated single-degree-of-freedom", and that is somewhat different 
from the coefficients, but I can't make out from the book (Wu & Hamada) 
why the coefficients should be any different than the effects, to the 
contrary, it is quite clear from equation (5.8) in the book that the 
coefficients they use are effects(leaf.lm)/4.

So, there are at least two points of confusion here, one is how coef() 
differs from effects() in the case of fractional factorial experiments, 
and the other is the factor 1/4 between the coefficients used by Wu & 
Hamada and the values returned by effects() as I would think from theory 
I've read that it should be a factor 2.

Best regards,

Kjetil
-- 
Kjetil Kjernsmo
PhD Research Fellow, University of Oslo, Norway
Semantic Web / SPARQL Query Federation
kjekje at ifi.uio.no http://www.kjetil.kjernsmo.net/