[BioC] LIMMA : design (1, 2, 3, 3 ) , I got EXCITING results, what could be the logic, since i have 2 replicates for 3rd group only ?

Wed Apr 27 16:02:29 CEST 2005

Just to elaborate slightly on Sean's response, the idea of the empirical Bayes "pooling" strategy
used in limma is that you don't need to choose between using a fold change strategy or using
t-tests.  Rather the software moves you on sliding scale between these two strategies depending on
how much information there is about the variances in the data and how different the variances seem
to be.  In your situation, with only 1 df for error, the limma rankings will usually be much
closer to fold-change ranking than to a t-test ranking.  Even here the moderated t-statistic
approach is usually still preferable over ranking on fold change because genes for which the
available replicates disagree will get down-weighted.

Gordon

----- ogininal message ----------
Sean Davis sdavis2 at mail.nih.gov
Wed Apr 27 12:07:28 CEST 2005

On Apr 26, 2005, at 9:51 PM, Saurin Jani wrote:
> Hi Adai,
>
> Yes, you are right. I have 4 samples :
>
> Group1 = Growth Effect for Day 1 : 1 Affy GeneChip.
> Group2 = Growth Effect for Day 2 : 1 Affy GeneChip.
> Group3 = Growth Effect for Day 3 : 2 Affy GeneChips.
>
> so, my design matrix is:
> design <- model.matrix(~ -1+factor(c(1,2,3,3)));
>
> LIMMA did not give any error or waring even it has 1
> sample per group...! ( I thought similar thing,  since
> it needs technical replicates per group to make a
> decision). The results are very interesting. I have
> many genes for 0.01 FDR, which is very good.
>
> Somehow,I don't understand the logic. Do you think is
> this a valid design? Or You think I should go by Fold
> Change Logic. Please, let me know.

Limma can and does use a "pooled" variance estimate, so the estimate of
variance used here, though not "within-gene", is probably not too far
off (i.e., you can have an estimate of the variance with only one array
in a group).  Without replicates, that estimate is certainly subject to
more error than with replicates.  However, fold-change is probably even
one more step away from any statistical footing, as it includes NO
estimate of variance and probably offers little or no advantage.  That
said, high fold-changes and low p-values are probably more likely than
low fold-changes and higher p-values to be real, but I don't think that
either measure is likely to be very robust.

Arguments can proceed down the statistical road, but obviously the best
(only) option that will produce statistically meaningful results would
be to do more arrays.  Short of that, I think the fold-change and limma
are likely in practice to produce similar ordered lists that could, at
best (and with the knowledge that the order is to be taken with a large
grain of salt), be used to guide biologic validation or further
experiments.

Sean