[BioC] Unexpected results of differential expression analysis
whuber at embl.de
Sun Jun 23 17:45:45 CEST 2013
did you already contact the authors of that paper for a transcript of their analysis / the exact parameters, software versions, filters, etc. used?
On 22 Jun 2013, at 13:06, Laura [guest] <guest at bioconductor.org> wrote:
> I am analysing the GEO dataset GSE19736 using SAM (significance analysis for microarrays), particularly the R package called samr but I am not getting the results that I was expecting.
> According to the published study, which also uses this tool, there should be 1028 differentially expressed genes (554 up-regulated and 474 down-regulated). When I run the analysis on the data I get a lot more of genes that are differentially expressed. I don't know what I might be doing wrong or where the difference lays.
> I am using the following code:
> #Extracting files
>> cel <- list.celfiles()
>> abatch.raw <- read.celfiles(cel)
>> geneSummaries <- rma(abatch.raw)
> #Extracting expression matrix
>> expressionmatrix <- exprs (geneSummaries)
>> samrobj <- samr (data, resp.type="Quantitative", nperms=50, center.arrays=TRUE, assay.type="array")
>> delta.table <- samr.compute.delta.table(samrobj)
>> siggenes.table<-samr.compute.siggenes.table(samrobj,2.5, data, delta.table, min.foldchange=1.5, compute.localfdr=TRUE)
>> samr.pvalues.from.perms (samrobj$tt, samrobj$ttstar)
> If I understood it correctly you can know the number of differentially expressed genes this way for the upregulated:
> and this way for the downregulated:
> I find there are 1598 upregulated genes and 1721 downregulated genes, and the number varies greatly depending on the value I give to delta.
> I tried assesing differential expression with limma instead, in this case I found that the number of differentially expressed genes was half the expected...
> Does anyone have any clue?
> -- output of sessionInfo():
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
>  LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8
>  LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
>  LC_PAPER=C LC_NAME=C LC_ADDRESS=C
>  LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
>  compiler splines parallel stats graphics grDevices utils datasets methods
>  base
> other attached packages:
>  limma_3.14.4 pd.hugene.1.0.st.v1_3.8.0 GOstats_2.26.0
>  Category_2.26.0 GSEABase_1.22.0 graph_1.38.2
>  annaffy_1.32.0 KEGG.db_2.9.1 GO.db_2.9.0
>  preprocessCore_1.20.0 samr_2.0 matrixStats_0.8.1
>  impute_1.34.0 pdInfoBuilder_1.22.0 affxparser_1.30.2
>  pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 oligo_1.22.0
>  oligoClasses_1.20.0 nnet_7.3-4 mgcv_1.7-18
>  Matrix_1.0-6 lattice_0.20-6 KernSmooth_2.23-8
>  gcrma_2.30.0 affy_1.36.1 foreign_0.8-50
>  DBI_0.2-7 cluster_1.14.2 survival_2.36-14
>  rpart_3.1-54 BiocInstaller_1.8.3 annotate_1.38.0
>  AnnotationDbi_1.22.6 Biobase_2.18.0 BiocGenerics_0.6.0
> loaded via a namespace (and not attached):
>  affyio_1.26.0 AnnotationForge_1.2.1 Biostrings_2.26.3 bit_1.1-10
>  codetools_0.2-8 ff_2.2-11 foreach_1.4.1 genefilter_1.42.0
>  GenomicRanges_1.10.7 grid_2.15.1 IRanges_1.16.6 iterators_1.0.6
>  nlme_3.1-104 RBGL_1.36.2 R.methodsS3_1.4.2 rstudio_0.97.246
>  stats4_2.15.1 tools_2.15.1 XML_3.96-1.1 xtable_1.7-1
>  zlibbioc_1.4.0
> Sent via the guest posting facility at bioconductor.org.
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor