[BioC] Limma; a kind of extended paired analyses with or without treatment

James W. MacDonald jmacdon at uw.edu
Mon Oct 15 16:49:35 CEST 2012


Hi John,

On 10/14/2012 6:51 AM, john herbert wrote:
> Dear James,
> I attempted "design<- model.matrix(~0+treatment*time+sample)"; the
> design matrix did not make sense to me, unfortunately. So I stepped
> back and did your first suggestion.
>
> patient<- factor(targets$patient)
> treat<- factor(targets$treat, levels=c("N","Y"))
> time<- factor(targets$time)
>
> treatment<- factor(paste(treat,time,sep="."))

design <- model.matrix(~0+treatment+patient)

Then continue as you have, omitting the call to duplicateCorrelation(), 
and no block or correlation args for lmFit().

I don't think what you have done is technically wrong, you are just 
making different assumptions as compared to fitting patient as a random 
effect.

If you fit patient as a fixed effect, you are assuming that you can 
model the inter-patient differences as a shift in mean expression. In 
other words, you are assuming that for a given gene, patient 1 has on 
average a higher expression than patient 2, and you can account for that 
by subtracting a constant from the expression of patient 1 and then 
making comparisons. This is a pretty simple assumption, and not likely 
to be questioned by a reviewer, in addition to being easy to explain.

If you fit patient as a random effect, you are fitting a different type 
of model that incorporates the correlation structure between related 
samples. In this case you are estimating the intra-patient correlation 
and then assuming that a.) the intra-patient correlation is the same for 
all patients, and b.) that the intra-patient correlation is the same for 
all samples for a given patient. So for this model you are making 
additional assumptions. Usually you want to go with the simplest model 
possible, and since you are not precluded from fitting everything as a 
fixed effect, that is probably the safest way to go.

Best,

Jim



> design<- model.matrix(~0+treatment)
> colnames(design)<- levels(treatment)
>
> #>  head(design)
> #   N.0 N.24 N.72 Y.24 Y.72
> # 1   0    0    0    1    0
> # 2   0    0    0    0    1
>
> corfit<- duplicateCorrelation(y_norm_ave,design,block=targets$patient)
>
> fit<- lmFit(y_norm_ave, design, block=targets$patient,
> correlation=corfit$consensus)
>
> cm<- makeContrasts(
> Treat24_vs_Normal24 = Y.24-N.24,
> Treat72_vs_Normal72 = Y.72-N.72,
> Treat24_vs_Normal0 = Y.24-N.0,
> Treat72_vs_Normal0 = Y.72-N.0,
> Normal24_vs_Normal0 = N.24-N.0,
> Normal72_vs_Normal0 = N.72-N.0,
> Diff=(Y.72-N.0)-(N.72-N.0),
> levels=design)
>
> fit2<- contrasts.fit(fit, cm)
> fit2<- eBayes(fit2)
>
> topTable(fit2, coef="Treat24_vs_Normal24")
>
> topTable(fit2, coef="Treat72_vs_Normal72").......
>
> A look at some top diff genes, looks reasonably encouraging based on
> the biology but take your observation that there is no need for a
> random effect model. This style of code and matrix I can make sense
> of; is there a way of integrating your second suggestion into this
> code, so that the design and contrast matrices make sense to me?
>
> design<- model.matrix(~0+treatment) # (above)
>
> and
>
> design<- model.matrix(~0+treatment*time+sample)
>
> produce very different looking design matrices.
>
> Kind regards,
>
> John.
>
> On Fri, Oct 12, 2012 at 8:23 AM, john herbert<arraystruggles at gmail.com>  wrote:
>> Thanks James,
>> appreciated as you have saved me a lot of time.
>>
>> John.
>>
>> On Thu, Oct 11, 2012 at 7:48 PM, James W. MacDonald<jmacdon at uw.edu>  wrote:
>>> Ugh. Jumped the gun. This does *not* require you to fit a random effects
>>> model, as you have done every treatment to cells from each patient. You can
>>> just block on Sample and then make your comparisons.
>>>
>>> In other words, if you add Sample to your design matrix, you will in effect
>>> be removing the patient-specific effect. Something like
>>>
>>> design<- model.matrix(~0+treatment*time+sample)
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>> On 10/11/2012 2:27 PM, john herbert wrote:
>>>> Thanks James,
>>>> This does not have time course but judging by your answer, I can just
>>>> add this in, in place of, say, tissue.
>>>>
>>>> Kind regards,
>>>>
>>>> John.
>>>>
>>>> On Thu, Oct 11, 2012 at 7:23 PM, James W. MacDonald<jmacdon at uw.edu>
>>>> wrote:
>>>>> Hi John,
>>>>>
>>>>>
>>>>> On 10/11/2012 2:15 PM, john herbert wrote:
>>>>>> Dear all.
>>>>>> I have been pondering about constructing a design matrix based on the
>>>>>> Limma user guide, where I combine a time course with a paired
>>>>>> analyses. The targets file looks like;
>>>>>>
>>>>>> Sample  treatment       time
>>>>>> 1       control 24
>>>>>> 1       control 72
>>>>>> 1       control 0
>>>>>> 1       treatment       24
>>>>>> 1       treatment       72
>>>>>> 2       control 24
>>>>>> 2       control 72
>>>>>> 2       control 0
>>>>>> 2       treatment       24
>>>>>> 2       treatment       72
>>>>>> 3       control 24
>>>>>> 3       control 72
>>>>>> 3       control 0
>>>>>> 3       treatment       24
>>>>>> 3       treatment       72
>>>>>>
>>>>>> Sample number refers to an individuals cancer cells, treatment refers
>>>>>> to added drug or not and numbers are in hours (time elapsed). So it is
>>>>>> a kind of paired, as patient variability is to be considered. The
>>>>>> control sample at 0 is the same as treatment at time 0 as these are
>>>>>> the same cells without any time/treatment.
>>>>>>
>>>>>> Please could someone help me understand how I can construct a design
>>>>>> matrix and to understand how I can extract differently expressed genes
>>>>>> that come about due to time, due to treatment and interaction of them
>>>>>> both.
>>>>>>
>>>>>> Any pointers appreciated, though I am trying to see if the examples in
>>>>>> the manual can be applied to this scenario.
>>>>>
>>>>> See the multi-level experiment example in the user guide, starting on p.
>>>>> 47.
>>>>>
>>>>> Best,
>>>>>
>>>>> Jim
>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> John.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>> --
>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> University of Washington
>>>>> Environmental and Occupational Health Sciences
>>>>> 4225 Roosevelt Way NE, # 100
>>>>> Seattle WA 98105-6099
>>>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list