[BioC] Multiple test question in micrarray- FDR

Wayne Xu wxu at msi.umn.edu
Mon Dec 15 19:44:21 CET 2008

Thanks, Naomi,
I appreciate this mailing list for providing an opportunity for 
discussion. I hope more people would be interested in my question too.


Naomi Altman wrote:
> The ball model does not apply to microarray studies.  (And the 
> probability of drawing the red ball in 20 draws is not 1).
> But FDR does apply to microarray studies, and so does a less discussed 
> concept, the false nondiscovery rate or FNR.
> Suppose I take 20 independent samples of mouse liver tissue - same 
> strain, gender ... and hybridize independently to 20 microarrays - any 
> platform.
> Then arbitrarily divide into 2 groups of size 10.  If there are 10,000 
> genes on the array, you should see 1 gene with p-value .0001or less, 
> 10 genes with p-value .001 or less, 100 genes with p-value .01 or less 
> etc.  Now suppose you take the 100 genes with the highest degree of 
> differential expression and do a PCR study with independent samples.  
> You should still have 1 gene which is significant with p=.01 and 5 
> genes which are significant at p=.05.
> The problem is - there is no systematic difference between the 
> samples.  You have detected noise - i.e. chance variation.  If you use 
> the same samples to do your PCR, you may get closer to 100% 
> "significance" for the selected genes, because the variation that 
> caused the false detection will still be in the sample unless it was 
> due only to the hybridization.
> FDR is an estimate of the excess of significant findings, compared to 
> what is expected by chance.  You can reduce FDR greatly by doing 
> independent follow-up studies (on another microarray or on another 
> platform such as PCR).  You cannot reduce FDR much by reusing the same 
> samples on a different platform, although you will reduce affects due 
> to technical variation.
> However, FDR reduces your power to detect differential expression.  
> This means that you will have higher FNR if you use multiple 
> comparisons adjustments.  Again, if you do independent follow-up 
> studies, you can reduce FNR.
> The purpose of the FDR computation is to reduce effort wasted on large 
> gene lists which are mostly reporting noise.  But if your genelist is 
> smaller than you think is reasonable, you may certainly follow up a 
> larger set of genes and sorting by p-value will give you the most 
> reasonable set of genes to follow up.  Again,
> the only valid follow-up uses independent samples and independent 
> platforms.  \
> --Naomi
> At 02:38 PM 12/14/2008, Wayne Xu wrote:
>> Dear Naomi,
>> I may have a silly question. I read a few papers on microarray 
>> multiple test, I understood what points they were trying to make. But 
>> I still have doubts about it. Since now many journal reviewers 
>> require the FDR for microarray differential expresses genes in 
>> manuscripts, I really want to clear my doubts.
>> 1). The mathematics model is different from the biology model:
>> The typical math model to bring up the multiple test issue is 
>> following example: 20 balls in a box with 1 in red and 19 in blue. 
>> The possibility of picking up the red ball from the box each time is 
>> 1/20, i.e 0.05. If draw 20 times, the chance is 0.05 multiplied by 20 
>> is 1.
>> Suppose the red represents false positive, if draw one time the FDR 
>> is 0.05, if 20 times then FDR is 1. People bring this multiple test 
>> issue into microarray data analysis. But in microarray, at least two 
>> aspects are different from this math model:
>> a). The raw P values are determined by the expression values of 
>> samples, not affected by the total number of genes.  So it is 
>> different from above example of 1 out of 20 is 0.05.
>> b). Pick up a ball and then put it back to the box, you have chance 
>> to pick up the exactly same ball twice or more. But in microarray, 
>> each genes are tested individually at the same time, and each gene 
>> only tested exactly once.
>> They are obviously different. If this math model is the only reason 
>> that brought up the multiple test issue in microarray, it may be a 
>> misleading (I may be silly, since no one else doubts about multiple 
>> test in microarray?)
>> 2). Not make biological sense:
>> Suppose a gene called XYZ has a raw P value of 0.00001 in two group T 
>> test, and it was validated by biological test, e.g. RT-PCR. If the 
>> micoarray chip has 40,000 genes, then by whatever adjustment  FDR 
>> method, the adj P-value may be 0.4 or lower or higher. If I use FDR 
>> cutoff 0.1, this XYZ gene has higher FDR and is not in my interest 
>> positive gene list.
>> OK, now I play a math game, filter gene by variance or other, shrink 
>> the gene list to 5000 (since XYZ gene has low P value, suppose it is 
>> within the 5000). Then the XYZ has low FDR and in my interest 
>> differential gene list. But this is just a math game!
>> The biological reality is XYZ is positive, this positive is 
>> determined by, for example 4 control samples and 4 treatment samples, 
>> the mean may be big different, and within group variance is very 
>> small. and RT-PCR validated. This reality can not be changed by 
>> whatever number of genes to be tested. The raw P value is close the 
>> biological reality, and it is good to represent the biological 
>> reality. The multiple test here just make you feel happier but not a 
>> biological sense.
>> FDR is a very useful term in many biological cases.  But it seems not 
>> a good example here for microarray?
>> Please help to clear it up.
>> Thank you,
>> Wayne
>> -- 
>> Naomi Altman wrote:
>>> Remember that FDR is a rate - i.e. the expected false discovery rate.
>>> If the set of genes is changeds, FDR will change because the 
>>> comparison set is different.  This is NOT the same as a p-value, 
>>> which depends only on the value of the current test statistic.
>>> The same thing happens with FWER, because these methods control the 
>>> probability of making at least one mistake, which clearly depends on 
>>> which set of tests are performed.
>>> --Naomi
>>> At 03:11 PM 12/13/2008, Sean Davis wrote:
>>>> On Sat, Dec 13, 2008 at 12:36 PM, Wayne Xu <wxu at msi.umn.edu> wrote:
>>>> > Hello,
>>>> > I am not sure this is a right place to ask this question, but it 
>>>> is about
>>>> > micrarray data analysis:
>>>> >
>>>> > In two group t test, the multiple test Q values are depending on 
>>>> the total
>>>> > number of genes in the test. If I filter the gene list first, for 
>>>> example, I
>>>> > only use those genes that have1.2 fold changes for T test and 
>>>> multiple test,
>>>> > this gene list is much smaller than the total gene list, then the 
>>>> multiple
>>>> > test q values are much smaller.
>>>> >
>>>> > Do you think above is a correct way? People who do not do that 
>>>> way may
>>>> > consider the statistical power may be lost? But how much power 
>>>> lost and how
>>>> > to calculate the power in this case?
>>>> No, you cannot filter based on fold change.  However, you can filter
>>>> based on variance or some other measure that does not depend on the
>>>> two groups being compared.  Anything that filters genes based on
>>>> "knowing" the two groups will lead to a biased test.  Remember that
>>>> filtering removes genes from consideration from further analysis.
>>>> For further details, there are MANY discussions of this topic in the
>>>> mailing list.
>>>> > When people report multiple test Q values, they usually do not 
>>>> mention how
>>>> > many genes are used in this multiple test. You can get different 
>>>> Q values
>>>> > (even use the same method, e.g. Benjamin and Holm adjust method) 
>>>> in the same
>>>> > dataset. Then how can it make sense if the same genes have 
>>>> different Q
>>>> > values?
>>>> A good manuscript should describe in detail the preprocessing and
>>>> filtering steps, the statistical tests used, and the methods for
>>>> correcting for multiple testing.  You are correct that many papers do
>>>> not do so.
>>>> As for different q-values in the same dataset using different methods,
>>>> it is important to note that one should not do an analysis, get a
>>>> result, and then, based on that result, go back and redo the analysis
>>>> with different parameters to get a "better" result.  It is very
>>>> important that each step of an analysis (preprocessing, filtering,
>>>> testing, multiple-testing correction) be justifiable independent of
>>>> the other steps in order for the results to be interpretable.
>>>> Sean
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> Naomi S. Altman                                814-865-3791 (voice)
>>> Associate Professor
>>> Dept. of Statistics                              814-863-7114 (fax)
>>> Penn State University                         814-865-1348 (Statistics)
>>> University Park, PA 16802-2111
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111

More information about the Bioconductor mailing list