[BioC] limma

Thu Apr 7 08:49:40 CEST 2011

Dear Prof Gordon,

Many thanks for your time and for very clear explanation which is very
helpful for me and for all scientists involved in microarray analysis .
Several Bioinformaticians and statisticians I had contacted to discuss this
point told me that they would be very interested in your opinion on this
question so I will send them the link to read this very useful discussion.

Thank you for the two papers, I have read the first one and started the
second yesterday.

Best wishes,
Seraya

-----Original Message-----
From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] 
Sent: Donnerstag, 7. April 2011 03:30
To: Seraya Maouche
Cc: Wei Shi; Bioconductor mailing list
Subject: limma

Dear Seraya,

Wei Shi has put his finger on the issue in a recent reply to this thread. 
Let me elaborate.

Firstly, the true fold change for a gene expressed in one condition but not
the other is Infinity.  While true, this a bit unhelpful, because an
Infinite fold change doesn't tell you whether the gene is highly expressed
or just barely expressed in the condition for which it is expressed.

limma gets around this problem in the following way (assuming that you're
using limma preprocessing as well as linear model fitting).  limma offsets
all the expression values away from zero, so that all genes get a minimum
expression level.

If you use the limma neqc() function to normalize Illumina data, the default
offset is 16, translating to 4 on the log2 scale.  This is why your AveExpr
values will never be less than 4. So, when a gene is absent in one
condition, and has average expression value x in the other, the fold changes
is computed something like:

   logFC = log2( (x+16) / (0+16) )

This means that limma never returns an infinite fold change.  Note also that
the denominator is not noise, rather it is the offset plus a very small
amount of noise.  This means that the estimated fold change is not unstable
or highly variable.  It is quite stable, but biased.  The act of offsetting
the expression values away from zero means that the fold changes tend to be
underestimated, although the bias is negligible for highly expressed genes.
Generally speaking, the gain in noise reduction and statistical power that
arises from using a small offset far outways the disadvantage of biasing the
fold changes changes.  This has been extensively discussed in the recent
paper:

Shi, W, Oshlack, A, and Smyth, GK (2010). Optimizing the noise versus bias
trade-off for Illumina Whole Genome Expression BeadChips. Nucleic Acids
Research 38, e204.

By the way, you might like to try out the propexpr() function in limma also,
see:

Shi, W, de Graaf, C, Kinkel, S, Achtman, A, Baldwin, T, Schofield, L, Scott,
H, Hilton, D, Smyth, GK (2010). Estimating the proportion of microarray
probes expressed in an RNA sample. Nucleic Acids Research 38, 2168-2176.

You could say to the reviewer: "limma ensures that all probes are assigned
at least a minimum non-zero expression level on all arrays, in order to
minimize the variability of log-intensities for lowly expressed probes. 
Probes that are expressed in one condition but not other will be assigned a
large fold change for which the denominator is the minimum expression level.
This approach has the advantage that genes can be ranked by fold change in a
meaningful way, because genes with larger expression expression changes will
always be assigned a larger fold change."

Best wishes
Gordon

> Date: Tue, 05 Apr 2011 18:05:08 +0200
> From: "Seraya Maouche" <Seraya.Maouche at uk-sh.de>
> To: <Bioconductor at r-project.org>
> Subject: [BioC] Limma
>
> Dear Prof Gordon, dear Bioconductor members,
>
> I have performed gene expression analysis using Limma (Illumina human
> ref8) comparing two types of cells (referred below as cond1 and cond2).
> Based on detection call, I filtered out transcripts which are absent 
> in both types of cells. Transcripts which were expressed only in one 
> cell type were included in the analysis.
>
> I have received the comment below from a reviewer who seems not agree 
> to calculate fold change for genes expressed only in one condition. 
> Would it be possible to have your opinion about this.
>
> Thank you in advance for your time,
> S Maouche
>