[BioC] question on the cutoff for limma package

Tue May 24 18:26:49 CEST 2011

On Mon, May 23, 2011 at 8:48 PM, Yi, Ming (NIH/NCI) [C]
<yiming at mail.nih.gov> wrote:
>
> Dear List:
>
> Recently, I used limma package to analyze some miRNA array data. One of the differential lists I derived for one of the contrasts in our limma model just used P.Value <0.01 as cutoff combined FC cutoff, we noticed that in this particular contrast, all the differential miRNAs have rather high adj.P.Val almost all miRNAs are 1 or very close to 1 (e.g., 0.973 etc)  (I used adj="fdr" in topTable...) although the other contrasts in the same model we set up in limma does have "normal" looking adj.P.Val ranged from 1 to about 0.01.
>
> >From our previous experience, sometimes, even with very high adj.P.Val, with decent P.Value (e.g., <0.01), we can have good validation. In this case, now we validated two miRNAs from the list both with good P.Value <0.01 but with rather high adj.P.Val (both are around 0.97 or 1). We did validate one of them as good miRNA but the other one is bad (we can not validate it as differential).
>
> I understood it is more subjective aspect and we only validated two of chosen miRNAs in this case (and we encountered similar situation before for validation of other dataset), and many people used FDR or adjusted p-value varied from 5% to 30% commonly.
> my first question is: what kind of situation could lead to adj.P.Val for all of genes in the list as high as 0.97 to 1 (there are about 6k features in the dataset?
>
> What shall be the cutoff for P.Value and adj.P.Val in the situation like this? Considering both or more specifically on adj.P.Val? In our case, if rely on adj.P.Val only for cutoff, which are all so high, we do not have any single miRNA that we can choose, however, our biological validation experiment indeed validate a good one (although we only validated just two of them, still kind of much higher than expected considering the fact that none of them has decent adj.P.Val but rather bad ones). If rely on P.Value (e.g. <0.01), we do have quite a few mRNAs in the list, but each one with sky high adj.P.Val! and we only can validate 1 of the 2 chosen candidates as good one.
>
> Any insight or experience to share with?

Hi, Ming.  The problem with using raw p-values is that there is no
control for multiple testing.  There are many methods to control for
multiple testing, of which one is the 'FDR'.  So, I would tend to rely
on a statistical measure that attempts to control for multiple testing
(such as the FDR); the raw p-values from limma do not do so.  Whether
or not you include a further fold change filter will be a matter of
experimental specifics.  That is not to say that one cannot do what
you have done and "rank" genes, even those not statistically
significant, by some measures, but one cannot easily conclude that
there is evidence of differential expression without a
multiple-testing-corrected statistical measure being significant.

As for your situation, there are multiple reasons that might lead to
lack of evidence of differential expression.  First, there may truly
be no difference for a contrast.  Second, technical artifacts or noise
may make such a difference difficult or impossible to detect.  Third
(and related to the second), the sample size may be too small to
detect a difference.  Remember that not rejecting the null hypothesis
(of no differential expression) is not the same thing as proving the
null hypothesis; we cannot prove the null hypothesis, typically.

Some of the more statistically-minded might have clearer explanations
for some of what I said above, but I think the rule-of-thumb is to
rely on multiple-testing-corrected p-values and not on uncorrected
p-values for determining statistical significance.

Sean