[BioC] Repeated Measures mRNA expression analysis

Mon Jul 1 16:16:04 CEST 2013

Hi Charles,

On 7/1/2013 9:07 AM, Charles Determan Jr wrote:
> I apologize for a second post but I want to bring this questing back up as
> I still cannot find a definitive answer on my own.  In brief, I am
> wondering about the design matrix when testing for differential expression
> between two groups within which each sample has been measured at
> consecutive timepoints (repeated measures).  Therefore, if my
> interpretations are correct, I need a two-way analysis that recognizes
> dependence between consecutive measurements.  I am familiar with limma,
> edgeR and DESeq but am uncertain how to design an appropriate design matrix
> for these comparisons.  The best I can guess is that I add a 'Subject'
> factor to the design matrix corresponding to each unique sample to correct
> for dependence, is this correct?

It depends on how sophisticated you want to get, or alternatively what 
assumptions you are willing to make.

The simplest thing to do would be to block on subject (see the blocking 
portion of the limma User's guide, starting on p. 42). This makes very 
simple assumptions about the data, namely that the differences between 
subjects can be accounted for by the mean of each subject.

Best,

Jim

>
> My sincere regards,
> Charles
>
>
> On Wed, Jun 26, 2013 at 11:54 AM, Charles Determan Jr<deter088 at umn.edu>wrote:
>
>> To help clarify further here is a dataframe of the design.
>>
>>     subject  group times
>> 1        1 Treated    0hr
>> 2        2 Treated    0hr
>> 3        3 Control    0hr
>> 4        4 Treated    0hr
>> 5        5 Control    0hr
>> 6        6 Control    0hr
>> 7        1 Treated    1hr
>> 8        2 Treated    1hr
>> 9        3 Control    1hr
>>
>> ...
>>
>> 17       5 Control    2hr
>>
>> 18 6 Control 2hr
>>
>> My thought process has been as follows:
>>
>> In the edgeR userguide there is the treatment combination example
>>
>>> targets
>> Sample Treat Time
>> 1 Sample1 Placebo 0h
>> 2 Sample2 Placebo 0h
>> 3 Sample3 Placebo 1h
>> 4 Sample4 Placebo 1h
>> 5 Sample5 Placebo 2h
>>
>> 6 Sample6 Placebo 2h
>> 7 Sample1 Drug 0h
>> 8 Sample2 Drug 0h
>> 9 Sample3 Drug 1h
>> 10 Sample4 Drug 1h
>> 11 Sample5 Drug 2h
>> 12 Sample6 Drug 2h
>>
>> which combines the groups to produce a single group (ex. Drug.1, Placebo.1, Drug.2, etc)
>>
>> This seems potentially appropriate but this appears to assume independence between samples whereas my data consists of what you could call 'true repeated measures' on the same sample.  This seems to draw on the paired samples and blocked examples.  These proceed by having the 'subject' as a factor as well, for example:
>>
>> design<- model.matrix(~Subject+Treatment)
>>
>> This leads me to guess that a combination of these techniques is required.  Perhaps merging the times and group factors in my dataset (see above) as 'newgroup' (e.g. Control.0, Control.1, Treatment.0, etc).  Then create the model formula:
>>
>> design<- model.matrix(~Subject+newgroup)
>>
>> Does this seem appropriate or am I way off base and over thinking this?  Thanks for any suggestions.
>>
>> Regards,
>> Charles
>>
>>
>>
>> On Tue, Jun 25, 2013 at 11:11 PM, Gordon K Smyth<smyth at wehi.edu.au>wrote:
>>
>>> Charles,
>>>
>>> Are there only 2 biological units in your experiment?  (One for treatment
>>> and one for control?)  Or do you have multiple biological units in each
>>> group?  Surely it must be the latter but, if so, your model does not take
>>> this into account.
>>>
>>> What questions do you want to test?
>>>
>>> Best
>>> Gordon
>>>
>>>
>>>
>>> On Tue, 25 Jun 2013, Charles Determan Jr wrote:
>>>
>>>   Gordon,
>>>> I apologize for not being more definitive with my description.  Your
>>>> initial definition is my intention, consecutive measurements on the same
>>>> biological units.  I will look over the comments in the link you
>>>> provided.
>>>> Thank you for your insight, I appreciate any further thoughts you may
>>>> have.
>>>>
>>>> Regards,
>>>> Charles
>>>>
>>>>
>>>> On Tue, Jun 25, 2013 at 6:57 PM, Gordon K Smyth<smyth at wehi.edu.au>
>>>> wrote:
>>>>
>>>>   Dear Charles,
>>>>> The term "repeated measures" describes a situation in which repeated
>>>>> measurements are made on the same biological unit.  Hence the repeated
>>>>> measurements are correlated.  It is not clear from the brief information
>>>>> you give whether this is the case, or whether the different time points
>>>>> derive from independent biological samples.
>>>>>
>>>>> The model you give might or might not be correct, depending on the
>>>>> experimental units and the hypotheses that you plan to test.  For most
>>>>> experiments it is not the right approach, for reasons that I have
>>>>> pointed
>>>>> out elsewhere:
>>>>>
>>>>> https://www.stat.math.ethz.ch/****pipermail/bioconductor/2013-****<https://www.stat.math.ethz.ch/**pipermail/bioconductor/2013-**>
>>>>> June/053297.html<https://www.**stat.math.ethz.ch/pipermail/**
>>>>> bioconductor/2013-June/053297.**html<https://www.stat.math.ethz.ch/pipermail/bioconductor/2013-June/053297.html>
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>
>>>>>   Date: Mon, 24 Jun 2013 15:08:48 -0500
>>>>>
>>>>>> From: Charles Determan Jr<deter088 at umn.edu>
>>>>>> To: bioconductor at r-project.org
>>>>>> Subject: [BioC] Repeated Measures mRNA expression analysis
>>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> I need to analyze data collected from an RNA-seq experiment.  This
>>>>>> consists of comparing two groups (control vs. treatment) and repeated
>>>>>> sampling (1 hour, 2 hours, 3 hours).  If this were a univariate problem I
>>>>>> know I would use a 2-way rmANOVA analysis but this is RNA-seq and I have
>>>>>> thousands of variables.  I am very familiar with multiple packages for RNA
>>>>>> differential expression analysis (e.g. DESeq2, edgeR, limma, etc.) but I
>>>>>> have been unable to figure out what the most appropriate way to analyze
>>>>>> such data in this circumstance. The closest answer I can find within the
>>>>>> DESeq2 and edgeR manuals (limma is somewhat confusing to me) is to place to
>>>>>> main treatment of interest at the end of the design formula, for example:
>>>>>>
>>>>>> design(dds)<- formula(~ time + treatment)
>>>>>>
>>>>>> Is this what is considered the appropriate way to address repeated
>>>>>> measures
>>>>>> in mRNA expression experiments?  Any thoughts are appreciated.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> --
>>>>>> Charles Determan
>>>>>> Integrated Biosciences PhD Candidate
>>>>>> University of Minnesota
>>>>>>
>>>>>>
>>>> --
>>>> Charles Determan
>>>> Integrated Biosciences PhD Candidate
>>>> University of Minnesota
>>>>
>>>>
>>> ______________________________**______________________________**
>>> __________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________**______________________________**
>>> __________
>>>
>>
>>
>> --
>> Charles Determan
>> Integrated Biosciences PhD Candidate
>> University of Minnesota
>>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099