[BioC] Q-values way smaller than P-values! Is that kosher?

Sun Jan 12 09:34:20 CET 2014

Hi Josh,

Of course it's not statistically defensible, and you can't use these 
qvalues.  As Mike Love points out, qvalue just isn't intended for small 
sets of p-values like this.

The main problem is that qvalue's estimate of pi0 (the proportion of true 
nulls) is not reliable for small numbers of tests.  For your data, qvalue 
estimates pi0 to be

   > qvalue(p.values)$pi0
   [1] 0.05057643

which is equivalent to saying that most likely all the null hypotheses 
should be rejected.

Some other methods of estimating pi0 are more robust.  For example the 
limma function propTrueNull gives pi0=0.75 for the same data:

   > library(limma)
   > propTrueNull(p.values)
   [1] 0.7506187

The qvalue method of estimating pi0 is sensitive to the size of the 
largest p.values.  If you changed just the 0.79 p-value to 0.99, all the 
qvalues would look very different:

   > data.frame(p.values,q.values=qvalue(p.values)$qvalues)
       p.values  q.values
   1 0.26684513 0.5509513
   2 0.35367528 0.5509513
   3 0.03058514 0.1330221
   4 0.99900000 0.7500975
   5 0.60805832 0.5870052
   6 0.43346672 0.5509513
   7 0.03936944 0.1330221
   8 0.48918122 0.5509513
   9 0.74062659 0.6256105

For very small sets of p-values, it seems to me to be safer to use the 
Benjamini and Hochberg algorithm (available in the p.adjust function). 
BH is more conservative than qvalue, but it doesn't rely on an estimator 
for pi0 and is trustworthy for any number of tests.

Best wishes
Gordon

On Sat, 11 Jan 2014, Michael Love wrote:

> hi Joshua,
>
> While you wait for a definitive answer from the package maintainer, note
> that the methods in the package are designed for datasets with number of
> tests, m, in the thousands.
>
> for example, the reference mentions:
>
> "Because we are considering many features (i.e., *m* is very large), it can
> be shown that..."
>
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC170937/
>
>
> Mike
>
>
>
>
> On Fri, Jan 10, 2014 at 1:08 PM, Joshua Banta <jbanta at uttyler.edu> wrote:
>
>> Dear listserv,
>>
>> Consider the following vector of P-values:
>>
>> p.values <- c(
>> 0.266845129,
>> 0.353675275,
>> 0.030585143,
>> 0.791285527,
>> 0.60805832,
>> 0.433466716,
>> 0.039369439,
>> 0.48918122,
>> 0.740626586
>> )
>>
>> When I run the qvalue function of the qvalue package, I get a rather
>> surprising outcome: the q-values are much smaller than the corresponding
>> p-values!
>>
>> See the following code below:
>>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite("qvalue")
>> library(qvalue)
>>
>> q.values <- qvalue(p.values)$qvalue
>>
>> comparison <- cbind(p.values, q.values)
>>
>> colnames(comparison) <- c("p.values", "q.values")
>>
>> comparison
>>
>>         p.values    q.values
>>  [1,] 0.26684513 0.037111560
>>  [2,] 0.35367528 0.037111560
>>  [3,] 0.03058514 0.008960246
>>  [4,] 0.79128553 0.040020397
>>  [5,] 0.60805832 0.039540110
>>  [6,] 0.43346672 0.037111560
>>  [7,] 0.03936944 0.008960246
>>  [8,] 0.48918122 0.037111560
>>  [9,] 0.74062659 0.040020397
>>
>> So what is happening here? Can I trust the q-values? In each case they are
>> substantially larger than the P-values. Especially troubling to me are rows
>> 4, 5, and 9.
>>
>> Thanks in advance for any information/advice!
>> -----------------------------------
>> Josh Banta, Ph.D
>> Assistant Professor
>> Department of Biology
>> The University of Texas at Tyler
>> Tyler, TX 75799
>> Tel: (903) 565-5655
>> http://plantevolutionaryecology.org

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}