[BioC] limma regression / fdr permutation correction

Matthew Hannah Hannah at mpimp-golm.mpg.de
Thu Mar 24 15:00:48 CET 2005

Easter greetings, and a question ;-)

I have 8 genotypes with 3 biological replicas per genotype. I also have
a quantified variable (growth) for each.
Currently I use a model like
x <-
c(1.5,1.5,1.5,2.3,2.3,2.3,3.6,3.6,3.6,4.2,4.2,4.2,4.8,4.8,4.8,5.2,5.2,5.
2,5.5,5.5,5.5,6,6,6)
design <- model.matrix(~x)
fit <- lmFit(esetgcrma, design)
ebfit <- eBayes(fit)

Firstly, obviously it is fitting each of the 3 replica arrays to the
same value. However, we have measured the growth several times and
although they are under the same conditions these are not directly
paired to the 3 replica. So the growth is really 1.5 +/- 0.3, 2.3 +/-
0.2...etc. Is there any way to take this into account (ie: if different
growth have bigger SD than others)? Should the design be fit against 8
groups and then compared to 8 values - can/how would you do this?

Secondly, the eBayes is moderating the p-values for the returned
coefficients - intercept and slope. The intercept one is obviously not
of interest. The slope one is the probability of it being different from
zero so I assume that tells you if a gene correlates with growth. Is the
eBayes removing genes that have low p-values but very shallow slopes as
I would expect or is there more to consider?

Finally, I'd like to ask about permutation testing and using the results
to correct the p-values using an fdr procedure. As there are only 8
groups permutating the arrays randomly will also be testing if the
groupings make sense. I think the better way is to just permutate the 8
growth values but preserve the group structure as this will indicate how
many genes correlate by chance. My question is once I've done my (say)
1000 permutations what should I do with the computed p-values and how do
I use them to correct the 'real' p.values. I'm looking for practical