[BioC] Strange FDR values using GSA package with large numbers of permutations

Alexander C Cambon alex.cambon at louisville.edu
Thu Sep 17 17:43:19 CEST 2009

I am using the GSA package for gene set analysis. I have normalize data from a  microarray experiment with four groups and four replicates (16 total arrays).

I downloaded the  set of curated gene sets ("c2.all.v2.5.symbols.gmt")  from the Broad Institute web page MSigDB (I did register)  and read the data into R using the GSA package as follows:

geneset.obj<- GSA.read.gmt("c2.all.v2.5.symbols.gmt") 

GSA.obj3<-GSA(x,y, genenames=gn, genesets=geneset.obj$genesets,  resp.type="Multiclass", nperms=10000)

(I also tried a smaller number of permutations).

Then, I got the gene set list using

GSA.3<-GSA.listsets(GSA.obj3, geneset.names=geneset.obj$geneset.names,FDRcut=.5)

> dim(GSA.3$positive)
[1] 721   5

I noticed that when I used 10000 permutations, the p-values for the gene sets varied all the way from 0.0013 for the top gene set to 0.9434 for the gene sets at the bottom, but the FDR values for all but the last of the 721 gene sets stayed at  0.0749. This did not happen, at least not this severely, with a smaller number of permutations.

Does anyone have an explanation? 

Alexander Cambon
Dept of Bioinformatics and Biostatistics 
School of Public Health and Information Sciences
University of Louisville
Louisville, KY

Here is a sample of some of the first ones (I x'ed out the gene set numbers and names)

Gene_set        Gene_set_name          Score    p-value          FDR

 "xxx"                 "xxx"                            "0.0994"     "0.0013"     "0.0749"
  "xxx"                "xxx"                           "0.1622"      "0.0013"     "0.0749"
 "xx"                  "xxxx"                          "0.3668"     "0.0016"      "0.0749"
 "xxx"               "xxxx"                           "0.2887"     "0.0016"     "0.0749"

Here are the last ones
Gene_set        Gene_set_name      Score       p-value       FDR

 "xxx"                 "xxx"                           "0.0285"     "0.8882"      "0.0749"
 "xxx"                 " xxx"                            "0.024"     "0.901"        "0.0749"
 "xxx"                 "xxx"                            "0.0035"   "0.9434"       "0.0749"

I am using Widows XP
> sessionInfo()
R version 2.9.1 (2009-06-26) 

LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GSA_1.0

More information about the Bioconductor mailing list