[BioC] Harsh results using limma!

Gordon K Smyth smyth at wehi.EDU.AU
Sat Aug 14 02:07:03 CEST 2004


> I think Mick's experiences point out a fundamental problem with current statistical analysis of
> microarray data.  If his data was .2, .2, .2,  (dye flips) -.2, -.2, -.2 then Limma would note
> this gene as highly differentially expressed.  In contrast when he sees 6.29, 5.54, 0.2, (dye
> flips)-5.27,-4.61,   -0.2 Limma did not mark it as differentially expressed.

Actually it is not true that limma will necessarily rank the first gene higher than the second. 
Obviously t-tests would do so, but limma may well rank the second gene higher depending on the
information about variability inferred from the whole data set.  Looking at fold change alone
ranks the second gene higher while t-tests would rank the first higher.  Limma is somewhere in
between depending on the dataset.  A typical microarray dataset actually would lead to the second
gene being ranked higher, i.e., would lead to the ranking that you would prefer.

>      As a biologist I would argue the case for the genes actually being differentially expressed
> is much higher in the second case.  Yet using modified T-statistic approaches and with the
> limited number of repeats common with current array experiments,  I see array experiments
> "missing" these very interesting high variance genes all the time.
>     Current analytical techniques put a high premium on consistency of results and a lower premium
> on strength of differential expression which is the parameter that biologists would argue is
> the most significant.
>      There are a variety of biological reasons why high variance genes should exist and personally
> I think these genes are likely to be the biologically interesting ones that we should be
> looking for on microarrays.
>      I understand why Limma does what it is does and it is a fantastically useful program.
> However, I would suggest to the statisticians reading this message  that it would be very
> useful to start developing analytical techniques which could better detect high variance
> genes.

I agree with the overall point.  Two strategies currently available are:
1. Use spot quality weights.  In the example given above it appears that two of the arrays or
spots have failed to register any worthwhile fold change for a gene which is differentially
expressed on the other arrays.  If this can be identified as being due to low quality spots or
arrays, then the values may be down-weighted in an analysis and the gene will revert to being
highly significant.
2. If small fold changes are not of biological interest to you, then you can require a minimum
magnitude for the fold change as well as looking for evidence of differential expression.

Gordon

> David Pritchard



More information about the Bioconductor mailing list