[BioC] question on the cutoff for limma package

Yi, Ming (NIH/NCI) [C] yiming at mail.nih.gov
Tue May 24 20:33:33 CEST 2011


Hi, Sean:

Thanks a lot for your very nice comments and diagnosis on the issues and share with me. Yes, in general I would more rely on the multiple-test based statistics such as FDR or adjusted p-value etc. However, in this particular situation, we do not have a single candidate for further experiment if we do that and bench scientist has nothing to pursue further. But more trickily is as I mentioned, we indeed pick up some candidates based on raw p-value and they can successfully validate some of them with multiple-sample q-PCR approach and now actively pursue further. 

My concerns for multiple-test is maybe in some cases, tended to be too stringent (this may also depend upon the multi-test method, I used fdr or BH method, which is popular) especially for our case where we have no single candidates. In fact, I do hear similar cases from others as well that candidates selected based on raw p-values some times work quite well in terms of validation rate.

Yes, the third reason you mentioned the sample size (5 vs 5 for the comparison in our case, these are mouse primary tumor-derived cell line clones and indeed much better than the high variations amongst human samples) would apply to us indeed. But for biologists and q-PCR they used to validate, that kind of level of replicates seem already make them happy to pursue further. 

Thanks for sharing!

Best

Ming

-----Original Message-----
From: Davis, Sean (NCI) On Behalf Of Davis, Sean (NIH/NCI) [E]
Sent: Tuesday, May 24, 2011 12:27 PM
To: Yi, Ming (NIH/NCI) [C]
Cc: Bioconductor mailing list
Subject: Re: [BioC] question on the cutoff for limma package

On Mon, May 23, 2011 at 8:48 PM, Yi, Ming (NIH/NCI) [C]
<yiming at mail.nih.gov> wrote:
>
> Dear List:
>
> Recently, I used limma package to analyze some miRNA array data. One of the differential lists I derived for one of the contrasts in our limma model just used P.Value <0.01 as cutoff combined FC cutoff, we noticed that in this particular contrast, all the differential miRNAs have rather high adj.P.Val almost all miRNAs are 1 or very close to 1 (e.g., 0.973 etc)  (I used adj="fdr" in topTable...) although the other contrasts in the same model we set up in limma does have "normal" looking adj.P.Val ranged from 1 to about 0.01.
>
> >From our previous experience, sometimes, even with very high adj.P.Val, with decent P.Value (e.g., <0.01), we can have good validation. In this case, now we validated two miRNAs from the list both with good P.Value <0.01 but with rather high adj.P.Val (both are around 0.97 or 1). We did validate one of them as good miRNA but the other one is bad (we can not validate it as differential).
>
> I understood it is more subjective aspect and we only validated two of chosen miRNAs in this case (and we encountered similar situation before for validation of other dataset), and many people used FDR or adjusted p-value varied from 5% to 30% commonly.

> my first question is: what kind of situation could lead to adj.P.Val for all of genes in the list as high as 0.97 to 1 (there are about 6k features in the dataset?
>
> What shall be the cutoff for P.Value and adj.P.Val in the situation like this? Considering both or more specifically on adj.P.Val? In our case, if rely on adj.P.Val only for cutoff, which are all so high, we do not have any single miRNA that we can choose, however, our biological validation experiment indeed validate a good one (although we only validated just two of them, still kind of much higher than expected considering the fact that none of them has decent adj.P.Val but rather bad ones). If rely on P.Value (e.g. <0.01), we do have quite a few mRNAs in the list, but each one with sky high adj.P.Val! and we only can validate 1 of the 2 chosen candidates as good one.
>
> Any insight or experience to share with?

Hi, Ming.  The problem with using raw p-values is that there is no
control for multiple testing.  There are many methods to control for
multiple testing, of which one is the 'FDR'.  So, I would tend to rely
on a statistical measure that attempts to control for multiple testing
(such as the FDR); the raw p-values from limma do not do so.  Whether
or not you include a further fold change filter will be a matter of
experimental specifics.  That is not to say that one cannot do what
you have done and "rank" genes, even those not statistically
significant, by some measures, but one cannot easily conclude that
there is evidence of differential expression without a
multiple-testing-corrected statistical measure being significant.

As for your situation, there are multiple reasons that might lead to
lack of evidence of differential expression.  First, there may truly
be no difference for a contrast.  Second, technical artifacts or noise
may make such a difference difficult or impossible to detect.  Third
(and related to the second), the sample size may be too small to
detect a difference.  Remember that not rejecting the null hypothesis
(of no differential expression) is not the same thing as proving the
null hypothesis; we cannot prove the null hypothesis, typically.

Some of the more statistically-minded might have clearer explanations
for some of what I said above, but I think the rule-of-thumb is to
rely on multiple-testing-corrected p-values and not on uncorrected
p-values for determining statistical significance.

Sean



More information about the Bioconductor mailing list