[BioC] siggenes permutation count problem

James W. MacDonald jmacdon at med.umich.edu
Sat Jan 7 14:33:35 CET 2006

paul.boutros at utoronto.ca wrote:
> Hello,
> I'm having some troubles interpreting how/why siggenes performed a certain 
> number of permutations on my dataset.  This is an affy dataset that was 
> normalized by:
> data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> eset <- expresso(data, normalize.method="constant", bgcorrect.method="none", 
> pmcorrect.method="mas", summary.method="avgdiff");
> I realize that the normalization is a bit unusual: this study is actually 
> testing a range of normalization methods.  This is a two-class experiment with 
> 3 arrays in each group:
> Expression Set (exprSet) with 
>         22690 genes
>         6 samples
>                  phenoData object with 1 variables and 6 cases
>          varLabels
>                 Group: read from file
> [1] 1 1 0 1 0 0
> So to do a SAM-like analysis I used:
> SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000);
> And I expected there to be 6! = 720 total possible permutations.  So I was 
> surprised to get this output:
>>SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000);
> We're doing 20 complete permutations
> Why does siggenes think there are only 20 complete permutations to be used?  
> Have I done something wrong, or is my understanding of how the permutations are 
> done in error?

It's a combination of incorrect terminology and (possibly) a 
misunderstanding on your part. First, there *are* 720 possible 
permutations, but we don't care about the ordering within each group 
since we are simply comparing group means. What we really want here are 
combinations, and there are only 20 combinations when you have 6 samples 
and you are choosing three for each group (see ?choose). If you did all 
720 permutations it would result in only 20 unique t-statistics with a 
lot of replication.

This terminology is a hold over from SAM, which AFAIK really did do the 
permutations rather than combinations. However, this is very wasteful of 
computing time especially when the number of replicates gets large, so 
siggenes rightly does the combinations and abuses terminology by calling 
them 'complete permutations'.



> This is R 2.2.1 and siggenes 1.4.0 on WinXP.
> Paul
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

James W. MacDonald
University of Michigan
Affymetrix and cDNA Microarray Core
1500 E Medical Center Drive
Ann Arbor MI 48109

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

More information about the Bioconductor mailing list