[BioC] Using DESeq2: experimental design and extracting results

Michael Love michaelisaiahlove at gmail.com
Wed Jul 30 19:24:38 CEST 2014


hi Sridhar,

The base mean is the mean of normalized counts for all samples.

A few weeks ago I wrote a reply showing how to calculate the mean for
each group, and you can easily modify this code to get the standard
deviation or any summary.

https://stat.ethz.ch/pipermail/bioconductor/2014-July/060631.html

Mike

On Wed, Jul 30, 2014 at 1:13 PM, Sridhar A Malkaram
<smalkaram at wvstateu.edu> wrote:
> Hi Michael,
>
> I'd like your suggestions on an additional question related to this
> previous thread on DESeq2.
>
> You might be aware that when publishing differential expression results,
> it is a convention to provide the mean and standard deviation values
> (after the normalization) for the genes being tested. The DE table
> output output by DESeq has a column "baseMean". Is this equivalent to
> the mean expression? Is there a way to get standard deviation? Can the
> mean and SD be obtained separately for both the treatments being compared?
>
> Thanks
>
> Sridhar Acharya
>
>
>
> On 6/11/2014 10:32 AM, Michael Love wrote:
>> hi Sridhar,
>>
>>
>>
>> On Wed, Jun 11, 2014 at 9:51 AM, Sridhar A Malkaram
>> <smalkaram at wvstateu.edu> wrote:
>>> Hi Michael,
>>>
>>> Thanks for explaining in detail.
>>> Just for clarification, I think we should add the "~" before the
>>> "genotype + time", right?
>> Yes good catch. I should have written "~ genotype + time"
>>
>>> Does "~" stand for formula notation in R?
>> Yes.
>>
>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time)
>>>
>>>
>>> Also, when I do,
>>>
>>>         dds <- nbinomWaldTest(dds, betaPrior=FALSE)
>>>         res<-results(dds, name="time2.gen2")
>>>         head(res)
>>>
>>> The first two lines of the output:
>>>         log2 fold change: time2.gen2
>>>         Wald test p-value: time2.gen2
>>>         .....
>>>
>>> What two components is the log2 fold change between? I mean, how do I read time2.gen2 ?
>>> How can I get the two expression values that are being compared. The older version of DESeq used to give these values in the results.
>>>
>> The time series model with interactions between time and a condition
>> variable is a bit more complicated than the simple group comparisons.
>> There are not simply two levels being compared. The way that
>> interaction effects are encoded in model matrices in R is to write the
>> two terms of which the interaction is composed next to each other. We
>> use this convention as well. I'll try to explain the meaning of the
>> model you have specified. The model fits a number of main effects:
>>
>> Firstly, we have the change in expression levels over time for the
>> base level of the condition variable (so gen1). These effects are the
>> fold change (or addition in the log2 space) of the time 2 over time1
>> (the intercept, or base level), time 3 over time1, time 4 over time 1.
>> If you specify results(dds, name="time_2_vs_1") you would get a
>> results table with a simple description of the comparison. But this is
>> probably not of interest for you for testing, as it doesn't tell you
>> anything about the effect of the experimental condition of interest,
>> e.g. the genotype.
>>
>> Secondly, we have the fold change of genotype 2 over 1 for the first
>> time period. This is also obtained with results(dds,
>> name="gen_2_vs_1")
>>
>> Finally are the interaction terms. For example, "gen2.time2" is the
>> fold change for genotype 2 at time 2 which is not explained by the
>> genotype 2 main fold change and the time 2 main fold change multiplied
>> together (or added in the log2 scale). If the log fold change of this
>> intercept term equals 0, then we say there is no interaction. In other
>> words, the effect of genotype 2 is the same at time 2 as it was at
>> time 1, after accounting for the general effect of time on expression.
>>
>> Mike
>>
>>> -- With Kind Regards,
>>> Sridhar Acharya
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 6/10/2014 8:32 PM, Michael Love wrote:
>>>> I left out one detail, below:
>>>>
>>>> On Tue, Jun 10, 2014 at 8:20 PM, Michael Love
>>>> <michaelisaiahlove at gmail.com> wrote:
>>>>> hi Sridhar,
>>>>>
>>>>> Let's keep the discussion on the mailing list in case the question is
>>>>> relevant to others.
>>>>>
>>>>> After you have run:
>>>>>
>>>>> design(dds) <- ~ genotype + time + genotype:time
>>>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time)
>>>>> res <- results(dds)
>>>>>
>>>>> The res object will contain the likelihood ratio test results, with
>>>>> small p-values for genes which have a genotype effect which is
>>>>> different than in the first time period. This tests all time periods
>>>>> after the first time period.
>>>>>
>>>>> You also see:
>>>>>
>>>>> resultsNames(ddsLRT)
>>>>> Intercept time2_vs_time1 time3_vs_time1 time4_vs_time1
>>>>> gen2_vs_gen1 time2.gen2 time3.gen2 time4.gen2
>>>>>
>>>>> The three of these which might be interesting for your experiment are:
>>>>>
>>>> Before the results line below you should call:
>>>>
>>>> dds <- nbinomWaldTest(dds, betaPrior=FALSE)
>>>>
>>>>> results(dds, name="time2.gen2")
>>>>> ...and same for time3.gen2, time4.gen2
>>>>>
>>>>> which will return a results table with Wald tests of the additional
>>>>> genotype effect in time 2 (additional beyond the genotype effect in
>>>>> the first time period). This is similar to the first LRT results
>>>>> above, except now we are asking for a different effect of genotype in
>>>>> a specific time period, not in all time periods.
>>>>>
>>>>> The other coefficients are the main effect terms. Results tables for
>>>>> these can also be built by using the 'name' argument to results().
>>>>> They are the intercept term, the effects of the different times over
>>>>> the initial time, and the effect for genotype 2 over 1 in the first
>>>>> time period. You don't want to use the contrast argument, which is for
>>>>> other kinds of models.
>>>>>
>>>>> Mike
>>>>>
>>>>> On Tue, Jun 10, 2014 at 3:38 PM, Michael Love
>>>>> <michaelisaiahlove at gmail.com> wrote:
>>>>>> hi Sridhar,
>>>>>>
>>>>>> On Tue, Jun 10, 2014 at 3:05 PM, Sridhar A Malkaram
>>>>>> <smalkaram at wvstateu.edu> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>> I have been a user of DESeq and recently DESeq2 for my research work.
>>>>>>> The latest DESeq2 seem to offer extensive differential testing options
>>>>>>> suitable for various experimental designs.
>>>>>>>
>>>>>>> Recently I wanted to use DESeq for a differential gene expression
>>>>>>> analysis between two plant genotypes across 4 different time points.
>>>>>>>
>>>>>>> I am basically a biologist and am finding hard to grasp the concepts of
>>>>>>> testing results. I'd be very grateful if you could help me understand
>>>>>>> some concepts (especially resultsNames) related to the DESeq2 package.
>>>>>>>
>>>>>>>
>>>>>>> My experimental design is as below
>>>>>>>
>>>>>>> design<- ~ genotype + time + genotype:time
>>>>>>>
>>>>>>> There are two levels in genotype and 4 levels in time.
>>>>>>> Basically I'd like to use binomLRT test to check if there is any
>>>>>>> difference in gene expression between the genotypes across the time points.
>>>>>>>
>>>>>>> dds<-DESeq(dds)  (dds is DESeq2 object  obtained from,
>>>>>>> dds<-DESeqDataSetFromMatrix(countData=counts, colData=coldata,
>>>>>>> design=design)
>>>>>>>
>>>>>>> and I am using the reduced model for the liklihood test
>>>>>>>
>>>>>> Here is where things are getting confused. You have already run
>>>>>> DESeq() using test="Wald". So it doesn't make sense at this point to
>>>>>> instead perform a likelihood ratio test. In our vignette we explain
>>>>>> this in the section on the LRT: "The likelihood ratio test can also be
>>>>>> specified using the test argument to DESeq, which substitutes
>>>>>> nbinomWaldTest with nbinomLRT."
>>>>>>
>>>>>>> Is the model correct per my research question (is there a (time
>>>>>>> influenced) difference  between genotypes)?
>>>>>>>
>>>>>> Yes. If you want to find those genes which show a time influenced
>>>>>> difference between genotypes, this is simply:
>>>>>>
>>>>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time)
>>>>>> res <- results(dds)
>>>>>>
>>>>>> You can then use heatmaps to inspect the patterns of gene expression
>>>>>> for the differentially expressed genes. Visualization with heatmaps
>>>>>> are covered in the vignette.
>>>>>>
>>>>>> If you have other more specific questions about how to generate
>>>>>> results tables, I can answer them. With time series experiments, there
>>>>>> are many possible combinations to test, but rather than going through
>>>>>> all combinations, we recommend that users explore the results with
>>>>>> heatmaps.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>> Thanks,
>>>>>>> Sridhar Acharya
>>>>>>>
>>>>>>>
>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list