[BioC] Limma: correct calculation of B statistics (log odds)

Fri Apr 21 00:46:34 CEST 2006

Dear Jose,

There are a number of important points that need to be made here:

1. The moderated t-statistics and p-values returned by limma do not 
use any assumption about the proportion of DE genes. These statistics 
rank the genes in the same order as the B-statistic, so there is 
actually no need to use the B-statistic and therefore no need to rely 
on the prior assumed proportion.

2. The B-statistic is the log-posterior odds of differential 
expression. Naturally the posterior probability has to depend on the 
prior probability. However the ranking of the genes given by the 
B-statistic doesn't depend on the prior probability.

3. It is easy in principle to estimate the proportion of 
differentially expressed genes from a statistical model, but such an 
estimate is all too likely to be practically bogus. Please see 
Section 6.4 of Smyth (2004) where there is a careful discussion of 
how the prior proportion could be estimated but why this is not done 
automatically in limma.

Smyth, G. K. (2004). Linear models and empirical Bayes methods for 
assessing differential expression in microarray experiments. 
Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf

4. In my opinion, the difficulty of getting a biologically meaningful 
estimate of the proportion of truly DE genes is shared by all 
statistical models, not just the one that limma is based on.

5. If you have a prior belief based on sound *biological* grounds 
that the overall proportion of DE genes should be substantially 
greater or less than 0.01, then you are permitted and encouraged to 
set the proportion for yourself.

6. If you really can't resist estimating the proportion of DE genes, 
the best method that I know of it that proposed by Ferkinstad et al 
(2005). This is implemented in limma in the convest() function. In 
limma you can use

   p0 <- convest(fit$p.value[,"coefofinterest"])

to estime the proportion of non-DE genes for your contrast of interest.

Ferkingstad, E., Langaas, M., and Lindqvist, B. (2005). Estimating 
the proportion of true null hypotheses, with application
to DNA microarray data. Journal of the Royal Statistical Society 
Series B, 67, 555-572.
http://www.math.ntnu.no/~mettela/SFG/research.imf

7. Finally, note that the number of truly DE genes is likely to be 
quite a bit greater than the number of statistically significant DE 
genes. This is because there may be many genes which are DE but with 
such small fold changes that you have no realistic chance of 
detecting them. This fact has two consequences. Firstly it means that 
you can't estimate the proportion simply by looking at the number of 
significant genes. Secondly it means that the proportion of truly DE 
may actually be of no real biological interest even if you knew it. 
This is because it includes, perhaps even is mostly made up of, genes 
of no practical interest because the fold changes too small to be important.

Best wishes
Gordon

>Date: Wed, 19 Apr 2006 13:05:59 +0100
>From: J.delasHeras at ed.ac.uk
>Subject: [BioC] Limma: correct calculation of B statistics (log odds)
>To: bioconductor at stat.math.ethz.ch
>
>
>I have been using B values to rank genes in order of more likely to
>less likely (differentially expressed) in LimmaGUI.
>
>I am now using Limma, I noticed the default value for the parameter
>"proportion" (on the function eBayes) is set at 0.01 (expected 1%
>differentially expressed genes). I didn't pay much attention to this
>parameter before, because in LimmaGUI you cannot specify it.
>
>However, now that I use "straight" Limma more I was playing with the
>proportion parameter and it affects the B stats a lot. Therefore I come
>to the question of what's the best way to estimate this parameter.
>
>My first guess is to use the P values (FDR, calculated by BH) to decide
>a cut off, usually 0.05. Then see how many genes are differentially
>expressed according to that rule. And use this observed proportion of
>differentially expressed genes as my proportion parameter.
>
>Is this the correct way to do it?
>
>Jose
>
>--
>Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
>The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
>Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
>Swann Building, Mayfield Road
>University of Edinburgh
>Edinburgh EH9 3JR
>UK