[BioC] Help on factorial experiment analysis using limma

Sat Sep 13 19:59:27 MEST 2003

Dear Gordon,

Thanks a lot for your prompt reply. I have more questions.

> This is a saturated direct design for a two-way factorial experiment.
> Good.

Is it still ok to use limma to analyze the data if one or both diagonal
experiment arrow is missing? I just realize I have another experiment that I
want to discard the diagonal experiments for some reasons. It seems I can
still run my analysis on the data without the diagonals by limma, but I have
no clue whether this is a legitimate analysis or not.

> This is a good question. For the purposes of the limma analysis, I think I
> would treat the technical reps as ordinary reps, i.e., treat the
> experiment
> has having 12 independent arrays. This will have the consequence that the
> standard errors from the analysis will be slightly under-estimated, i.e.,
> the significance results will be slightly over-stated, but the ranking of
> your genes in terms of evidence for differential expression will be close
> to optimal.

This reminds me a very general question with regard to replication I have
had for a while. What is the proper way to analyze the replicated data if
there are both biological replication and technical replication in the raw
data? Consider an example in which there are a hundred samples from
different cancer patients and the microarray experiment for each sample is
repeated three times. I heard some people would treat both biological and
technical replicates equally in this case. But isn't it true that the
technical replicates would have smaller variance and are somehow related
with each other and should be treated differently?

> More good questions. The design matrix that you've written above
> corresponds to a classical interaction parametrization. Here the column
> 'ab' corresponds to extra effect that 'a' has in the presence of 'b'. The
> effect of 'a' by itself (a0-00) is represented by the coefficient 'a' and
> the effect of 'a' in the presence of 'b' (ab-0b) is represented by the sum
> of the coefficients for 'a' and 'ba'. If 'b' is a confounding factor, then
> you probably want to have the effects for 'a' with and without 'b' in your
> heatdiagram. You could do this by
> 
> fit <- lm.series(MA, design)
> contrast.matrix <- makeContrasts(a,a+ba,levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> eb2 <- ebayes(fit2)
> heatdiagram(stat=eb2$t,coef=fit2$coef)
> 
> This would show whether genes which respond to factor 'a' still respond to
> 'a' in the presence of 'b', and whether in the same direction. Note that
> makeContrasts() is available only in the development version of limma.

Could you give me further assistance on understanding the biological meaning
of the heatdiagram? I know red means the gene is upregulated and green means
the gene is downregulated. Does white color mean the statistical test is not
significant for that gene? In my particular example, what does it mean when
the gene i is 

i. red for a and green/less red for a+ba? Does it mean b is suppressing the
effect of a on this gene?  
ii. red for a and white for a+ba? Does it mean b has no effect on this gene
while a is upregulating it?
iii. red for a and more red for a+ba? Does it mean a and b are both
upregulating gene i?
(I won't consider the green situations for a because they are essentially
the same)

This question sounds a little bit stupid, but I believe a step-by-step guide
is actually very helpful for biologist like me to understand the result. It
would be nice to have more tutorials on how to use heatdiagram, make
contrast, venndiagram etc to interpret the biological meaning of analysis
result in the manual of your future version of limma. 

Thanks very much!

Best regards,
Fai