[BioC] Unexpected results of differential expression analysis

Sat Jun 22 13:06:25 CEST 2013

Hello,

I am analysing the GEO dataset GSE19736 using SAM (significance analysis for microarrays), particularly the R package called samr but I am not getting the results that I was expecting.

According to the published study, which also uses this tool, there should be 1028 differentially expressed genes (554 up-regulated and 474 down-regulated). When I run the analysis on the data I get a lot more of genes that are differentially expressed. I don't know what I might be doing wrong or where the difference lays.

I am using the following code:
#Extracting files
>cel <- list.celfiles()
>abatch.raw <- read.celfiles(cel)

#Processing
>geneSummaries <- rma(abatch.raw)

#Extracting expression matrix
>expressionmatrix <- exprs (geneSummaries)

#SAM
>samrobj <- samr (data, resp.type="Quantitative", nperms=50, center.arrays=TRUE, assay.type="array")
>delta=2
>samr.plot(samrobj,delta)
>delta.table <- samr.compute.delta.table(samrobj)
>siggenes.table<-samr.compute.siggenes.table(samrobj,2.5, data, delta.table, min.foldchange=1.5, compute.localfdr=TRUE)
>samr.pvalues.from.perms (samrobj$tt, samrobj$ttstar)

If I understood it correctly you can know the number of differentially expressed genes this way for the upregulated:
> siggenes.table$ngenes.up

and this way for the downregulated:
> siggenes.table$ngenes.lo

I find there are 1598 upregulated genes and 1721 downregulated genes, and the number varies greatly depending on the value I give to delta.

I tried assesing differential expression with limma instead, in this case I found that the number of differentially expressed genes was half the expected...

Does anyone have any clue?
Thanks!

 -- output of sessionInfo(): 

R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8       
 [4] LC_COLLATE=es_ES.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] compiler  splines   parallel  stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
 [1] limma_3.14.4              pd.hugene.1.0.st.v1_3.8.0 GOstats_2.26.0           
 [4] Category_2.26.0           GSEABase_1.22.0           graph_1.38.2             
 [7] annaffy_1.32.0            KEGG.db_2.9.1             GO.db_2.9.0              
[10] preprocessCore_1.20.0     samr_2.0                  matrixStats_0.8.1        
[13] impute_1.34.0             pdInfoBuilder_1.22.0      affxparser_1.30.2        
[16] pd.huex.1.0.st.v2_3.8.0   RSQLite_0.11.4            oligo_1.22.0             
[19] oligoClasses_1.20.0       nnet_7.3-4                mgcv_1.7-18              
[22] Matrix_1.0-6              lattice_0.20-6            KernSmooth_2.23-8        
[25] gcrma_2.30.0              affy_1.36.1               foreign_0.8-50           
[28] DBI_0.2-7                 cluster_1.14.2            survival_2.36-14         
[31] rpart_3.1-54              BiocInstaller_1.8.3       annotate_1.38.0          
[34] AnnotationDbi_1.22.6      Biobase_2.18.0            BiocGenerics_0.6.0       

loaded via a namespace (and not attached):
 [1] affyio_1.26.0         AnnotationForge_1.2.1 Biostrings_2.26.3     bit_1.1-10           
 [5] codetools_0.2-8       ff_2.2-11             foreach_1.4.1         genefilter_1.42.0    
 [9] GenomicRanges_1.10.7  grid_2.15.1           IRanges_1.16.6        iterators_1.0.6      
[13] nlme_3.1-104          RBGL_1.36.2           R.methodsS3_1.4.2     rstudio_0.97.246     
[17] stats4_2.15.1         tools_2.15.1          XML_3.96-1.1          xtable_1.7-1         
[21] zlibbioc_1.4.0       

--
Sent via the guest posting facility at bioconductor.org.