[BioC] deseq2 for pair-wise analysis

Tue Aug 12 16:33:04 CEST 2014

I have just started analyzing my transcriptome assembly data by deseq2. I am finding that my two control samples do not coincide very well. This may be affecting the number of differentially expressed genes if I use the average  of two controls to compare with the treated. Is there a way I can minimize the effects of biological variation on the analysis?

The overall counts are almost the same in all four files (~59-65 million, with an average of 62 million and SD of 3 million). C1 and C2 (untreated) are two different batches of experimental material and one half of them are dosed (D1 and D2; treated). They are kind of paired that way, although they are distinct set of individuals between C1 & D1 or C2 & D2. We actually started with 3 reps but decided to pool the 2nd and 3rd reps to have sufficient RNA for follow up qPCR assays. 

For some reason (could be biological differences between batches), the counts in C2 are 3-5 fold high compared to C1 for only some transcripts. There is a faintly similar trend between D2 and D1, although the treatment has wiped off this difference in many of those transcripts. 

I am wondering if I can use the difference between the pairs (C1&D1 and C2&D2), at least some of the biological variability may be masked allowing us to see the effects of the treatment more clearly on the transcriptome.  

I find from the FAQs that there is a way to do pair-wise analysis in deseq2. I appreciate if you can elaborate the answer a little more. 

I have also another question related to this study. We actually used another closely related species for the study. Since this is small in size, we had to pool all three reps into one.  Transcripts are >75% identical at the DNA sequence level and the gene expression changes are similar between the species in response to the treatment. I am thinking of combining this data as a third rep of the study for deseq2 analysis, instead of throwing out the data. I appreciate to have your input about this step as well.

I am also attaching the code I used (thanks to Dave Wheeler).

Thank you very much for your help.

Subbaiah Chalivendra

 -- output of sessionInfo(): 

R version 3.1.1 (2014-07-10)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] BiocGenerics_0.10.0  GenomeInfoDb_1.0.2   GenomicRanges_1.16.3
[4] IRanges_1.22.9       parallel_3.1.1       stats4_3.1.1        
[7] XVector_0.4.0       

--
Sent via the guest posting facility at bioconductor.org.