[BioC] brief quesion on DESeq2

Wed Apr 10 10:38:18 CEST 2013

 On Tue, 2 Apr 2013 17:37:11 +0200, Michael Love wrote:
> hi Daniel,
>
> On Mon, Apr 1, 2013 at 6:41 PM, daniel.aguirre  wrote:
>
>> Hi,
>>
>> I´m a little puzzled about your 'Di erential analysis of count
>> data { the
>> DESeq2 package' protocol.
>>
>> I was trying it with two samples and got the DE results, then I
>> tried the suggested transformations:
>>
>> (being 'des' my previous results, just as it appears in the
>> 'manual')
>>
>> dseBlind
>
> Both varianceStabilizingTransformation and rlogTransformation return
> SummarizedExperiment objects: see the value section of the man pages
> for these functions, and the transformed values are accessed using 
> the
> assay() accessor, see the GenomicRanges manual pages on
> SummarizedExperiment. (you can do class(dse) or class(rld) to see 
> what
> kind of object you have)
>
> Section 7 and 8 in the vignette no longer have to do with DE 
> analysis,
> maybe we should make this more clear in the vignette. Here we 
> describe
> optional transformations of the data which might be useful for other
> applications, such as clustering, which might give nicer results when
> the variance is relatively constant across the range of values. For
> example we show a hierarchical clustering of the samples by
> transformed values in Figure 8 of the vignette.
>
>  
>
>> many many thanks!!
>>
>> (also, I assume that the aanlysis takes into account differences in
>> library depth and hence normalizes in this regard?)
>>
>> if I have several conditions (only one sample each though) should I
>> counduct pairwise analyses or would it be better to pool them
>> together so that the dispersion model is better? how would the
>> formula be written in that case?
>> cheers!
>
> if you have several conditions for one factor, we address this in
> Section G of the vignette on multi-level conditions. You just need to
> specify which level is the base level.  Then in the DE analysis, the
> other two levels will be compared against this one. We are working to
> implement the contrasts between all 3.
>
> If you have only one replicate per condition, you can treat the
> samples as replicates in order to calculate dispersion. In the
> original DESeq paper, they advise, "While one may not want to draw
> strong conclusions from such an analysis, it may still be useful for
> exploration and hypothesis generation." This is done automatically 
> for
> the 2 sample case, but I still need to generalize this code. You can
> use the code below in the meantime:
>
> The recommended pipeline then, for three samples with something like
> colData(dse)$condition
>
> Links:
> ------
> [1] mailto:daniel.aguirre at cbm.uam.es

 Hi again and Many thanks, it appears to have worked fine!

 I was expecting that (having no replicates, for several conditions of a 
 single factor/variable) using a dispersion model built from all my 
 samples will increase the number of DE genes found with respect to a 
 single pairwise analysis.

  However I found the contrary; e.g. comparing Condition A vs. B (one 
 replicate each) [using DESeqSummarizedExperimentFromHTSeqCount with A 
 and B only], I find 56 DE genes (pairwise comparison), when I build the 
 dispersion model etc. including other two samples the comparison between 
 A and B yields only 13 [here using 
 DESeqSummarizedExperimentFromHTSeqCount with A,B,C,D following the 
 proposed pipeline and then retrieving results for A vs B].

 Would you care to comment on this? Could you give me a hint as to why 
 this happens and I was wrong to assume te contrary)

 cheers,
 D