[BioC] limma - FDR adjusted "p-values"

Tue Feb 1 14:37:15 CET 2005

We use limma a lot, and from our point of view having both adjusted and
unadjusted p-values in the topTable() output would be beneficial.

Thanks
Mick

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Gordon K
Smyth
Sent: 01 February 2005 12:31
To: Naomi Altman
Cc: jstorey at u.washington.edu; bioconductor at stat.math.ethz.ch
Subject: [BioC] limma - FDR adjusted "p-values"

> Date: Mon, 31 Jan 2005 09:56:09 -0500
> From: Naomi Altman <naomi at stat.psu.edu>
> Subject: [BioC] limma - FDR adjusted "p-values"
> To: bioconductor at stat.math.ethz.ch
>
> Just a suggestion:
>
> The FDR adjusted "p-values" are called "q-values" in much of the 
> literature.  I suggest that limma follow suit,

It's certainly true that a lot of users have trouble with FDR and with
adjusted p-values in general.  Perhaps you're right that limma should
use the term "q-values".  This would associate p-values with
control/estimation of FWER and q-values with control/estimation of FDR.

The reason I haven't this so far is because the term "q-value" coined by
John Storey seems to me to measure something slightly different to
Benjamini and Hocherg adjusted p-values.  I think that John Storey's
q-value uses a slightly different definition of false discovery rate,
namely pFDR, the positive false rate.  Also I think it usually estimates
pFDR rather than formally controlling it.  Although there is a value "Q"
which appears in Benjamin and Hochberg's formulations, and it
is closely related to q-values, it is not exactly the same.   So I have
been reluctant to use the
term "q-value" for things which were not quite the same, as this would
cloud the fine meaning of the term.  Perhaps I am splitting hairs here
and should just accept the broad definition of q-value for FDR or pFDR
and p-value for FWER.  Any other opinions?

I have also thought that perhaps topTable() should label the
p-value/q-value column in the output to indicate which adjustment method
was used to generate the table.

> and also add a line to the
> documentation (it might already be there and I missed it)
>
> "If the number of significant results at level alpha is less than 
> alpha*(number of genes), then the q-value will be 1.0."
>
> It seems like I have to explain this to just about every investigator 
> who runs into this.

I get a lot of questions about this as well.  Actually, the statement
you've made isn't always true, although it usually is.  Even if the
smallest p-value out of n genes is only as small as 1/n, the "fdr"
adjusted p-value is not always 1.  It can be as small as 1/n depending
on the other n-1 p-values.

Perhaps the way to go would be for topTable() to output the raw p-values
as well as the adjusted p-values/q-values.  I haven't done this so as to
keep the table as small as possible, but it would prevent users from
being presented with just a list of p-values all equal to 1.  What do
you think?

Gordon

> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348
(Statistics)
> University Park, PA 16802-2111

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor