[BioC] DESeq2 - paired sample, interaction and fold change

Thu Jun 5 10:35:44 CEST 2014

Thanks Michael!

On 05/06/2014 01:49, Michael Love wrote:
> Maybe you left this out because it's a toy example, but we recommend 
> you use dds$treatment=factor(c("A","Ct","A","Ct","A","Ct","A","Ct"), 
> levels=c("Ct","A")), so that the comparisons will be treatment / 
> control. Also with WT / tumor. 
Thanks for the reminder, I did not understood it, now I got why 
sometimes I got the inverse of what I wanted...
>> If I do:
>>> res <- results(dds, name="conditionWT.treatmentCt")
>> I will get the gene for which there is a specific effect of treatment in the tumor, am I right?
> yes (except for a sign change, see note above on factor levels).
>
>> I understood that DESeq2 does make test taking into consideration paired samples, if the pairing information is put as a factor in the design formula (I understood it from http://seqanswers.com/forums/archive/index.php/t-34614.html, post by simon anders, 10-21-2013, 12:28 AM).
> yes, by including the patient variable in the design, differences in
> counts which are due to patient differences are accounted for.
I am still not sure I get it, certainly because I am not familiar with 
expanded design matrices...
the contrast "conditionWT.treatmentCt" will define the vector of 
contrast (named c in the doc/paper) that, multiplied by the vector of 
GLM coefficient Beta, will give the vector of coefficient that will be 
used to calculate the logarithmic posterior, right?
Is it possible to get the contrast vector / pair of contrast vector that 
correspond to a resultName?
>
>> Is it possible to get the individual fold change for each patients?
> Can you be more specific on which fold change? Are you interested in the tumor-treatment interaction term for each patient?
yes
>   How many replicates do you have for each combination of patient x condition x treatment?
3
>   If not many, I'd recommend sticking to the general effect above.
We suspect that not all sample respond to the treatment, and so we would 
like to know if the 'non-significance' of some genes is due to global 
non-response, or 1 sample that is not responding at all (or if they are 
all responding differently, so the variability is too high to be 
significant).
I can get this kind of information by getting the rlogTransformation of 
the counts and comparing the different samples, but these will not be 
the exact values used for the test... I would like to be able to extract 
these last ones.

Many thanks!