[BioC] Linear modeling for affy experiment

Thu Jun 10 23:06:28 CEST 2004

Hi there,

I am going to do an affy experiment for the first time. I have a few 
questions about the linear model design for my experiment.

I have a 3x2 factorial experiment. Three biological samples (wild type 
whole animal (WA), wild type tissue (WT), mutant tissue (MT)) and two 
time points (t1 and t2). The effects of interest are the mutant (M) and 
tissue (T) specific expression and their changes other time (Ti).

I suppose the model should have the following equations (error term 
omitted) and I would have 3 affy biological replicates for each condition.

WA.t1 = mu
WT.t1 = mu + T
MT.t1 = mu + T + M + T*M
WA.t2 = mu + Ti
WT.t2 = mu + T + Ti + T*Ti
MT.t2 = mu + T + M + Ti + T*M + M*Ti + T*Ti + T*M*Ti

My questions are:

1. How many degree of freedom do I have in the model? How do I calculate 
degree of freedom in linear model in general? For my case, is it  3 
arrays * 6 conditions - 8 coefficients to be estimated = 10 degree of 
freedom?

2. If I want to increase my degree of freedom, is it true that I can do 
it by increasing my replicate? If it is true, is there a difference 
between repeating a sample with more coeffcients (e.g. MT.t2) and a 
sample with less coefficients (e.g. WA.t1)? It seems to me having a 
repeat with more coefficients is better off, but I don't know have to 
stay it out statistically.

3. What is the formal way to determine whether an interaction term is 
meaningful/significant in the model or not? Is it by the p-value? And 
should I remove the term and fit the model (& again) if it is not 
significant and deemed not important by biological knowledge? Or should 
I just fit the full model once and go ahead to interpret the contrasts 
of interest? Is there a formal way (e.g. the diagnostics people use to 
assess ANOVA models) for evaluating the quality of the whole fitted 
model? Or I need not worry about this at all?

5. I have some confusion about the multiple hypothesis testing 
adjustment for many contrasts. (I know I should better only use the 
p-values/B/moderated t for ranking genes, but I am just curious to 
know). For example in limma one would extract the contrast of interest 
and list the candidate genes out on Toptable with the option = FDR etc., 
but isn't it true that this is just the adjustment for that estimate? 
When I evaluate all possible contrasts, how can I adjust the multiple 
hypothesis testing for the genes in all the contrasts that I have made?

6. A minor question. What does M & A in the Toptable of a 
coefficient/contrast mean for affy data? If A stands the log2 intensity 
estimate for that coefficient/contrast, is M the log2 ratio of (mu + 
(coefficient or contrast estimate))/mu?

Thanks a lot for answering my questions. Any other advice for my design 
is also welcome.

Best regards,
Fai
-- 
Yuk Fai Leung
Department of Molecular and Cellular Biology
Harvard University
BL 2079, 16 Divinity Avenue
Cambridge, MA 02138
Tel: 617-495-2599
Fax: 617-496-3321
email: yfleung at mcb.harvard.edu; yfleung at genomicshome.com
URL: http://genomicshome.com