[BioC] Multiple test question in micrarray- FDR
naomi at stat.psu.edu
Sun Dec 14 17:23:26 CET 2008
Remember that FDR is a rate - i.e. the expected false discovery
rate. If the set of genes is changeds, FDR will change because the
comparison set is different. This is NOT the same as a p-value,
which depends only on the value of the current test statistic.
The same thing happens with FWER, because these methods control the
probability of making at least one mistake, which clearly depends on
which set of tests are performed.
At 03:11 PM 12/13/2008, Sean Davis wrote:
>On Sat, Dec 13, 2008 at 12:36 PM, Wayne Xu <wxu at msi.umn.edu> wrote:
> > Hello,
> > I am not sure this is a right place to ask this question, but it is about
> > micrarray data analysis:
> > In two group t test, the multiple test Q values are depending on the total
> > number of genes in the test. If I filter the gene list first, for
> example, I
> > only use those genes that have1.2 fold changes for T test and
> multiple test,
> > this gene list is much smaller than the total gene list, then the multiple
> > test q values are much smaller.
> > Do you think above is a correct way? People who do not do that way may
> > consider the statistical power may be lost? But how much power lost and how
> > to calculate the power in this case?
>No, you cannot filter based on fold change. However, you can filter
>based on variance or some other measure that does not depend on the
>two groups being compared. Anything that filters genes based on
>"knowing" the two groups will lead to a biased test. Remember that
>filtering removes genes from consideration from further analysis.
>For further details, there are MANY discussions of this topic in the
> > When people report multiple test Q values, they usually do not mention how
> > many genes are used in this multiple test. You can get different Q values
> > (even use the same method, e.g. Benjamin and Holm adjust method)
> in the same
> > dataset. Then how can it make sense if the same genes have different Q
> > values?
>A good manuscript should describe in detail the preprocessing and
>filtering steps, the statistical tests used, and the methods for
>correcting for multiple testing. You are correct that many papers do
>not do so.
>As for different q-values in the same dataset using different methods,
>it is important to note that one should not do an analysis, get a
>result, and then, based on that result, go back and redo the analysis
>with different parameters to get a "better" result. It is very
>important that each step of an analysis (preprocessing, filtering,
>testing, multiple-testing correction) be justifiable independent of
>the other steps in order for the results to be interpretable.
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>Search the archives:
Naomi S. Altman 814-865-3791 (voice)
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor