[BioC] ANCOVA microarray time-course continuous & categorical variables / Limma extensions
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Oct 24 23:32:20 CEST 2013
Dear Michael,
On Thu, 24 Oct 2013, Michael Breen wrote:
> Hi all,
>
> Recently, there has been considerable interest in these types of studies,
> i.e. those outlined in section 8.7 Multi-level Experiments (in the limma
> manual) while including continuous variables that may have overall effects
> on gene-expression outcomes. Not only is it informative to correlate
> gene-expression to continuous variables, (for example gene-expression with
> tumor size) it is also informative to be able to subtract away any
> erroneous influences on gene-expression patters (for example, conducting a
> time-course study over 2 time points consisting of 2 groups, 1 disease and
> 1 healthy, while incorporating the number of cigarettes your cohorts smoke
> to the model, only to subtract that measure away in order to measure purely
> the main-effects- in this vague example = time and condition).
>
> From this banter, I have gathered that these techniques are applicable to
> Limma, although not apparently so from the manual itself. Could Limma and
> the community with such scenarios benefit from an example added to the
> Limma manual incorporating continuous variables?
Limma can handle any linear model. Just form the design matrix and
proceed.
> For now, we believe to have the model set-up properly to take into
> consideration a continuous variable over time between two groups. We want
> to test the main-effect of Time and Condition while removing additive
> effects of a continuous variable - here I have labeled it CES. Given that
> the syntax below is logical, I hope it can bring clarity to others with
> similar scenarios.
>
> #Main effects of time and Condition
> Treat <- factor(paste(targets$Condition,targets$Time,sep="."))
> #Continuous covariable for Cases.
> CES <- (targets$CES_Case)
> design <- model.matrix(~0+Treat+CES)
> colnames(design)
> [1] "Case.Time2" "Case.Time1" "Control.Time2"
> [4] "Control.Time1" "CES"
>
> corfit <- duplicateCorrelation(exprs,design,block=targets$Subject)
> corfit$consensus
> fit <-
> lmFit(exprs,design,block=targets$Subject,correlation=corfit$consensus)
>
> #As an example
> cm <- makeContrasts(
> Cases = Case.Time2-Case.Time1,
> Controls = Control.Time2-Control.Time1,
> Contrast= (Case.Time2-Case.Time1)-(Control.Time2-Control.Time1),
> aContinuousEffect = CES,
> Cases_no_ContinuousEffect=Case.Time2-Case.Time1-CES,
> levels=design)
This last contrast is unnecessary (and incorrect). All the treatment
effects have already been adjusted for the continuous covariate by
including the covariate in the linear model. The other contrasts already
do what you want. There is no need to manually subtract the covariate
coefficient from the interaction. Moreover the subtracted contrast has no
meaning because the continuous coefficient and the interaction are
different scales.
Best wishes
Gordon
> #Cases = Measures longitudinal effects overtime of cases
> #Controls =Measures longitudinal effects overtime of controls
> #Contrast=Measures longitudinal contrasts between cases and controls
> #aContinuousEffect= the effects of our continuous covariable
> #Cases_no_ContinuousEffect = Measures longitudinal effects overtime of
> cases without erroneous effects of the continuous variable
>
> fit2 <- contrasts.fit(fit, cm)
> fit2 <- eBayes(fit2)
> colnames(fit2)
>
>
> Best,
>
> Michael
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Oct 23, 2013 at 11:16 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>>
>> Date: Wed, 23 Oct 2013 08:03:19 +0100
>>> From: Michael Breen <breenbioinformatics at gmail.com**>
>>> To: Ryan <rct at thompsonclan.org>
>>> Cc: Bioconductor Mailing List <bioconductor at stat.math.ethz.**ch<bioconductor at stat.math.ethz.ch>
>>>> ,
>>> "bioconductor at r-project.org" <bioconductor at r-project.org>
>>> Subject: Re: [BioC] ANCOVA microarray time-course continuous &
>>> categorical variables / Limma extensions
>>>
>>> I have done this in the past, although I did not have much faith in it,
>>> maybe because of the way I create the design matrix.
>>>
>>> For example I do it this way:
>>>
>>> Treat <- factor(paste(targets$**Condition,targets$Time,sep="."**))
>>> design <- model.matrix(~0+Treat)
>>> colnames(design) <- levels(Treat)
>>> colnames(design)
>>> [1] "Case.Time2" "Case.Time1" "Control.Time2" "Control.Time1"
>>>
>>>
>>> #introduce continuous variable, add to design
>>> CES <- factor(targets$CES)
>>>
>>
>> Here is the problem. You are declaring your continuous variable to be
>> categorical, which is presumably not want you want to do.
>>
>> In R, a "factor" means a categorical variable. Anything that is not a
>> factor is considered to be continuous.
>>
>> You could either remove the factor() command here, or else simply use
>>
>> design <- model.matrix(~0+Treat+CES)
>>
>> Best wishes
>> Gordon
>>
>> design2 <- cbind(design, CES)
>>> colnames(design2)
>>> [1] "Case.Time2" "Case.Time1" "Control.Time2" "Control.Time1"
>>> "CES"
>>>
>>> I go on to duplicate correlation and lmfit then make these contrasts to
>>> substract away the effect of the continuous variable from our
>>> observations.
>>>
>>> cm <- makeContrasts(
>>> CaseEffect = Case.Time2-Case.Time1,
>>> ControlEffect = Control.Time2-Control.Time1,
>>> ContrastEffect = (Case.Time2-Case.Time1)-(**Control.Time2-Control.Time1),
>>> CaseEffectNoCov = (Case.Time2-Case.Time1)-(CES),
>>> ControlEffectNoCov = (Case.Time2-Case.Time1)-(CES),
>>> ContrastEffectNoCov = (Case.Time2-Case.Time1)-(CES) -
>>> (Case.Time2-Time1)-(CES),
>>> levels=designnew)
>>>
>>> etc...
>>>
>>> Our results are 'different' so it would be useful to have another opinion
>>> regarding this set-up.
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 11:03 PM, Ryan <rct at thompsonclan.org> wrote:
>>>
>>> Including continuous covariates in design matrices is R is just as easy
>>>> as
>>>> including categorical ones. Instead of creating a column for each degree
>>>> of
>>>> freedom in the categorical variable, you just end up with a single column
>>>> that simply contains the values of the continuous variable. Try using the
>>>> model.matrix function with a combination of your categorical variables
>>>> and
>>>> continuous ones to see what it does.
>>>>
>>>> -Ryan
>>>>
>>>>
>>>> On Tue Oct 22 14:36:38 2013, Richard Friedman wrote:
>>>>
>>>>
>>>>> On Oct 22, 2013, at 5:30 PM, Michael Breen wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>>
>>>>>> Our lab analyzes gene-expression from microarray and RNAseq
>>>>>> platforms. Currently, I am looking for a package to test
>>>>>> differential expression (DE) while considering continuous variables
>>>>>> that may alter gene-expression profiles. In other words, an ANCOVA
>>>>>> type tool. I am quite familiar with Limma (ANOVA) but including
>>>>>> continuous variables is not very well described.
>>>>>>
>>>>>> Specifically, we have a project were two groups can be modeled over
>>>>>> the same 2 time points. One group starts healthy and ends in a
>>>>>> disease state. The other group starts healthy and remains healthy.
>>>>>>
>>>>>> We are interested in identifying genes uniquely responding within
>>>>>> one group and not in the other. Thus, we have implemented a
>>>>>> longitudinal contrast with linear modeling through Limma. However,
>>>>>> we are also interested in adding one or two continuous variables
>>>>>> (tumor size, time spent meditating, the amount of drinks one
>>>>>> consumes etc..) to check if gene expression differences or
>>>>>> similarities may be due to these factors instead of due to
>>>>>> belonging to a certain class. Limma seems to test categorical
>>>>>> variables, but I don't think it is capable of either correlating
>>>>>> gene-expression to continuous variables.
>>>>>>
>>>>>> If not, can someone recommend a tool which may be appropriate for
>>>>>> such a situation?
>>>>>>
>>>>>> Yours,
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>> Dear Michael and list,
>>>>>
>>>>> I think that you write the design and contrast matrices
>>>>> exactly as you would for an ANCOVA in R only you do the
>>>>> fit and Bayesian correction in Limma.
>>>>>
>>>>> Perhaps someone who has had experience doing this
>>>>> kind of analysis can comment.
>>>>>
>>>>> Best wishes,
>>>>> Rich
>>>>>
>>>>> Richard A. Friedman, PhD
>>>>> Associate Research Scientist,
>>>>> Biomedical Informatics Shared Resource
>>>>> Herbert Irving Comprehensive Cancer Center (HICCC)
>>>>> Lecturer,
>>>>> Department of Biomedical Informatics (DBMI)
>>>>> Educational Coordinator,
>>>>> Center for Computational Biology and Bioinformatics (C2B2)/
>>>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
>>>>> Columbia Department of Systems Biology
>>>>> Room 824
>>>>> Irving Cancer Research Center
>>>>> Columbia University
>>>>> 1130 St. Nicholas Ave
>>>>> New York, NY 10032
>>>>> (212)851-4765 (voice)
>>>>> friedman at cancercenter.**columb**ia.edu <http://columbia.edu> <
>>>>> friedman at cancercenter.**columbia.edu<friedman at cancercenter.columbia.edu>
>>>>>>
>>>>> http://cancercenter.columbia.****edu/~friedman/<http://**
>>>>> cancercenter.columbia.edu/~**friedman/<http://cancercenter.columbia.edu/~friedman/>
>>>>>>
>>>>>
>>>>> In memoriam, Frederik Pohl
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list