[BioC] Limma: correct calculation of B statistics (log odds)

Gordon Smyth smyth at wehi.EDU.AU
Fri Apr 21 00:46:34 CEST 2006

Dear Jose,

There are a number of important points that need to be made here:

1. The moderated t-statistics and p-values returned by limma do not
use any assumption about the proportion of DE genes. These statistics
rank the genes in the same order as the B-statistic, so there is
actually no need to use the B-statistic and therefore no need to rely
on the prior assumed proportion.

2. The B-statistic is the log-posterior odds of differential
expression. Naturally the posterior probability has to depend on the
prior probability. However the ranking of the genes given by the
B-statistic doesn't depend on the prior probability.

3. It is easy in principle to estimate the proportion of
differentially expressed genes from a statistical model, but such an
estimate is all too likely to be practically bogus. Please see
Section 6.4 of Smyth (2004) where there is a careful discussion of
how the prior proportion could be estimated but why this is not done
automatically in limma.

Smyth, G. K. (2004). Linear models and empirical Bayes methods for
assessing differential expression in microarray experiments.
Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf

4. In my opinion, the difficulty of getting a biologically meaningful
estimate of the proportion of truly DE genes is shared by all
statistical models, not just the one that limma is based on.

5. If you have a prior belief based on sound *biological* grounds
that the overall proportion of DE genes should be substantially
greater or less than 0.01, then you are permitted and encouraged to
set the proportion for yourself.

6. If you really can't resist estimating the proportion of DE genes,
the best method that I know of it that proposed by Ferkinstad et al
(2005). This is implemented in limma in the convest() function. In
limma you can use

p0 <- convest(fit\$p.value[,"coefofinterest"])

to estime the proportion of non-DE genes for your contrast of interest.

Ferkingstad, E., Langaas, M., and Lindqvist, B. (2005). Estimating
the proportion of true null hypotheses, with application
to DNA microarray data. Journal of the Royal Statistical Society
Series B, 67, 555-572.
http://www.math.ntnu.no/~mettela/SFG/research.imf

7. Finally, note that the number of truly DE genes is likely to be
quite a bit greater than the number of statistically significant DE
genes. This is because there may be many genes which are DE but with
such small fold changes that you have no realistic chance of
detecting them. This fact has two consequences. Firstly it means that
you can't estimate the proportion simply by looking at the number of
significant genes. Secondly it means that the proportion of truly DE
may actually be of no real biological interest even if you knew it.
This is because it includes, perhaps even is mostly made up of, genes
of no practical interest because the fold changes too small to be important.

Best wishes
Gordon

>Date: Wed, 19 Apr 2006 13:05:59 +0100
>From: J.delasHeras at ed.ac.uk
>Subject: [BioC] Limma: correct calculation of B statistics (log odds)
>To: bioconductor at stat.math.ethz.ch
>
>
>I have been using B values to rank genes in order of more likely to
>less likely (differentially expressed) in LimmaGUI.
>
>I am now using Limma, I noticed the default value for the parameter
>"proportion" (on the function eBayes) is set at 0.01 (expected 1%
>differentially expressed genes). I didn't pay much attention to this
>parameter before, because in LimmaGUI you cannot specify it.
>
>However, now that I use "straight" Limma more I was playing with the
>proportion parameter and it affects the B stats a lot. Therefore I come
>to the question of what's the best way to estimate this parameter.
>
>My first guess is to use the P values (FDR, calculated by BH) to decide
>a cut off, usually 0.05. Then see how many genes are differentially
>expressed according to that rule. And use this observed proportion of
>differentially expressed genes as my proportion parameter.
>
>Is this the correct way to do it?
>
>Jose
>
>--
>Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
>The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
>Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360