[BioC] globaltest mulitple testing correction

Mon Oct 20 10:36:55 CEST 2008

Hello Michael,

I think there are some things you are confusing here. It is correct that 
SAM uses a permutation method to give q-values, i.e.estimates of the FDR 
one would obtain when thresholding at the given value of the 
test-statistic. This is SAM's specific way of using the permutations 
though. In general a permutation test will give you a simple traditional 
p-value for each gene, that has to be corrected for multiplicity just 
like any other p-value. The main difference is that a permutation method 
doesn't use a theoretical probability distribution to calculate, but 
uses an empircial distribution obtained by resampling. For large 
sample-sizes the two distributions and thus the two p-values obtained 
from them will hardly differ, which is one way to see that calculating a 
permutation p-value does not solve the multiple testing problem per se.
The same holds if you test many pathways/gene sets and obtain a p-value 
for each. If for example you have 100 pathways and call all the ones 
with p less than 5% significant you would expect 5 significant pathways 
by chance, even if none of them is really changed, i.e. you have the 
same old multiple testing problem. Possibly one could come up with SAM 
like way of giving q-values for this situation (it is quite likely that 
somebody has already come up with that idea too, others here might know 
that better), but as far as I know the Globaltest package doesn't do 
that, so they are absolutely correct in the paper and vignette about 
this issue.
One thing to keep in mind is that adjusting p-values for gene set 
analysis is not trivial as the gene sets are likely to overlap.

Hope that helps,

Claus

Michael Gormley wrote:
> In the paper and vignette describing the globaltest package, the
> authors mention the need for multiple testing when testing large
> numbers of pathways or functional gene groups.  While I agree the
> number of statistical tests does need to be accounted for, I do not
> understand the need for additional multiple testing correction if the
> permutation method of calculating p-values is used.  This method is
> used often to approximate the false discovery rate, most notably in
> the original implementation of Significance Analysis of Microarrays
> (SAM).  Am I on track with my assessment here or is the additional
> multiple testing correction used as a more accurate way of obtaining
> the true FDR?
>
> Thanks,
> Michael Gormley
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>   

-- 
***********************************************************************************
 Dr Claus-D. Mayer                    | http://www.bioss.ac.uk
 Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
 Rowett Research Institute            | Telephone: +44 (0) 1224 716652
 Aberdeen AB21 9SB, Scotland, UK.     | Fax: +44 (0) 1224 715349
***********************************************************************************

Biomathematics and Statistics Scotland (BioSS) is formally part of The Scottish Crop Research Institute (SCRI), a registered Scottish charity No. SC006662