[BioC] DESeq2 : Using Normalised ReadCount matrix from EDAseq in DESeq2

Thu Jun 26 15:16:24 CEST 2014

Hi,

I wanted to use a normalised read count matrix from EDAseq downstream in DESeq2 analysis. I am not very clear on how to do so from the vignette.

Following are the steps I followed -

## EDAseq - normalising count matrix by GC content

> dataWithin <- withinLaneNormalization(data, "pct_gc", which = "full")
> dataNorm <- betweenLaneNormalization(dataWithin, which = "full")

## I normalised the counts itself instead of generating the offsets as mentioned in the EDAseq vignetter

### DESeq2

> ?? 
> dds <- estimateDispersions(dds)
> dds <- nbinomWaldTest(dds)
> res <- results(dds2)

I dont know how to create a normalization factor matrix. The DESeq2 vignette on the other hand mentions that normalization factors should be on the scale of the counts, like size factors,
and unlike oï¬€sets which are typically on the scale of the predictors (i.e. the logarithmic scale for the
negative binomial GLM). 

So in that case should I generate the offset values from EDAseq ie.

> dataWithin <- withinLaneNormalization(data, "pct_gc", which = "full",offset=T)
> dataNorm <- betweenLaneNormalization(dataWithin, which = "full",offset=T)
> EDASeqNormFactors <- exp(-1 * offst(dataNorm))
> normalizationFactors(dds) <- EDASeqNormFactors
> dds <- estimateDispersions(dds)
> dds <- nbinomWaldTest(dds)
> res <- results(dds2)

 -- output of sessionInfo(): 

R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] DESeq2_1.4.5            RcppArmadillo_0.4.300.0 Rcpp_0.11.1            
 [4] EDASeq_1.10.0           aroma.light_2.0.0       matrixStats_0.8.14     
 [7] ShortRead_1.22.0        GenomicAlignments_1.0.1 BSgenome_1.32.0        
[10] Rsamtools_1.16.0        GenomicRanges_1.16.3    GenomeInfoDb_1.0.2     
[13] Biostrings_2.32.0       XVector_0.4.0           IRanges_1.22.7         
[16] BiocParallel_0.6.1      Biobase_2.24.0          BiocGenerics_0.10.0    

loaded via a namespace (and not attached):
 [1] annotate_1.42.0      AnnotationDbi_1.26.0 BatchJobs_1.2       
 [4] BBmisc_1.6           bitops_1.0-6         brew_1.0-6          
 [7] codetools_0.2-8      DBI_0.2-7            DESeq_1.16.0        
[10] digest_0.6.4         fail_1.2             foreach_1.4.2       
[13] genefilter_1.46.1    geneplotter_1.42.0   grid_3.1.0          
[16] hwriter_1.3          iterators_1.0.7      lattice_0.20-29     
[19] latticeExtra_0.6-26  locfit_1.5-9.1       plyr_1.8.1          
[22] RColorBrewer_1.0-5   R.methodsS3_1.6.1    R.oo_1.18.0         
[25] RSQLite_0.11.4       sendmailR_1.1-2      splines_3.1.0       
[28] stats4_3.1.0         stringr_0.6.2        survival_2.37-7     
[31] tools_3.1.0          XML_3.98-1.1         xtable_1.7-3        
[34] zlibbioc_1.10.0     

--
Sent via the guest posting facility at bioconductor.org.