[R] Understanding lm-based analysis of fractional factorial experiments

S Ellison S.Ellison at LGCGroup.com
Thu Mar 7 15:32:05 CET 2013


> So, there are at least two points of confusion here, one is 
> how coef() differs from effects() in the case of fractional 
> factorial experiments, and the other is the factor 1/4 
> between the coefficients used by Wu & Hamada and the values 
> returned by effects() as I would think from theory I've read 
> that it should be a factor 2.

Some observations to throw into the mix here.

First, the model.
 
> leaf.lm  <- lm(yavg ~ B * C * D * E * Q, data=leaf)

is indeed unnecessarily complicated. You can request all second order interactions (accepting that some will be NA) with 
> leaf.lm  <- lm(yavg ~ ( B + C + D + E + Q)^2, data=leaf)

or of course you can specify only the ones you know will be present:
> leaf.lm  <- lm(yavg ~ (B + C + D + E + Q + B:C + B:D + B:E + B:Q + C:Q + D:Q + E:Q, data=leaf)

I cheated a bit: 
> leaf.avg <- leaf[, c(1:5, 9)]
>  leaf.lm.avg  <- lm(yavg ~ ( .)^2, data=leaf.avg)
	#less typing, same model (with NAs for aliased interactions)

Now lets look at those effects.
First, look at the model.matrix for leaf.lm. 

>model.matrix(leaf.lm)
This shows you what your yavg is actually being regressed against - it's a matrix of dummy codings for your (character) levels "+" and "-". 
Notice the  values for B (under "B+") and the intercept. The first row (which has "-" for B in the leaf data set) has a zero coefficient; the second has a 1. So in this matrix, 1 corresponds to "+" and "-" corresponds to 0, which you can read as "not different from the intercept". This arises from teh way contrasts have been chosen; these arise from 'treatment contrasts', R's default for factors. Pretty much the same goes for all the other factors. This structure corresponds to measuring the 'alternate' level of every  factor - always "+" because the first level is "-" -  against the intercept. What the intercept represents is not immediately obvious if you;re used to the mean of the study as the intercept; it estimates the when all factors are in their 'zero' state. That's really useful if you want to test a series of alternate treatments against  a control. It's also R's default. But it's not what most industrial experiments based on 2-level fractional factorials want; they usually want differences between levels. That is why some folk would call R's defaults 'odd defaults'. 

To get something closer to the usual fractional factorial usage, you need to change those dummy codings from (0,1) for "-", "+" to something like +1, -1. You can change that using contrasts. To change the contrasts to something most industrial experimenters would expect, we use sum-to-zero contrasts. So we do this instead (using my leaf.avg data frame above):.

> options(contrasts=c("contr.sum", "contr.poly"))
> (leaf.lm.sum <- lm(yavg ~ (.)^2, data=leaf.avg))

Now we can see that the coefficient for B is indeed half the difference between levels, as you might have expected. And the intercept is now equal to the mean of the data, as you might expect from much of the industrial experiment design literature. 
So - why half, and why is it negative?

Look at the new model matrix:
> model.matrix(leaf.lm.sum)

This time, B is full of +1 and -1, instead of 0 and 1. So first, B+ and B- are both different (in equal and opposite directions) from the intercept. 
Second, the range allocated to B is (+1 - -1) = 2, so the change in yavg per _unit_ change in B  (the lm coefficient for B)- is half the difference between levels for B. 
Finally, look at the particular allocation of model matrix values for B. The first row has +1, the second row -1. That's because contr.sum has allocated +1 to the first level of the factor B, which (if you look at levels(leaf$B) is "-" because factor() uses a default alphabetic order. (You could change that for all your factors if you wanted, for example with leaf$B <- factor(leaf$B, levels=c("+","-")) and you'd then have +1 for "+" and -1 for "-") . But in the mean time, lm has been told that "-" corresponds to an _increase_ in the dummy coding for B. Since the mean for B=="-" is lower than the intercept, you get a negative coefficient.

effects() is doing somethig quite different; the help page tells you what its doing mathematically. It has an indirect physical interpretation: for simple designs like this, effects() output for the coefficients comes out as the coefficient multiplied by sqrt(n), where n is the number of observations in the experiment. But it is not giving you the effect of a factor in the sense of an effect of change in factor level on mean response.

Hope some of that is useful to you.

S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}



More information about the R-help mailing list