[BioC] limma modeling, paired samples and continuous variable
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Apr 19 02:11:59 CEST 2014
Sorry, I meant "Dear Michela".
Gordon
On Sat, 19 Apr 2014, Gordon K Smyth wrote:
> Dear Riba,
>
> Well, a couple of points.
>
> First, if you want to treat Condition as numeric, then you must not declare
> it to be a factor. In R, a "factor" is a variable that is categorical
> instead of numeric.
>
> Second, Condition is entirely confounded with Genotype in your experiment.
> Samples for the same Genotype always have the same Condition. For example,
> all samples of Genotype pt01 have Condition=0, all samples of Genotype pt06
> have Condition=0.5, and so on. Hence you cannot include Condition and
> Genotype in the same model, because they are giving the same information. If
> you adjust for Genotype in the model, then you have necessarily also adjusted
> for the Condition.
>
> You need:
>
> target<- readTargets("targetPTpGSp.txt")
> Genotype <- factor(target$Genotype)
> Disease<- factor(target$Disease)
> Condition <- target$Condition
>
> Then can use either:
>
> design <- model.matrix(~Genotype+Disease)
>
> or
>
> design <- mode.matrix(~Condition+Disease)
>
> Best wishes
> Gordon
>
>> Date: Fri, 18 Apr 2014 11:19:56 +0200
>> From: Riba Michela <riba.michela at gmail.com>
>> To: Gordon K Smyth <smyth at wehi.edu.au>
>> Cc: Bioconductor mailing list <bioconductor at r-project.org>, James W.
> MacDonald <jmacdon at uw.edu>
>> Subject: Re: limma modeling, paired samples and continuous variable
>>
>> Hi,
>> thanks a lot for your answer and I'm forwarding the covariate matrix
>> of our design.
>>
>> target<- readTargets("targetPTpGSp.txt")
>> head(target)
>>
>> Genotype <- factor(target$Genotype)
>> Disease<- factor(target$Disease, levels=c("stageA", "stageB",
>> "stageC"))
>>
>> # Condition <-factor(target$Condition)
>> r<-target$Condition #this should be numeric
>>
>> I'm just recalling the most striking parts of what I ideally would
>> try to do and what I have already did.
>>
>> Till now I have performed a paired samples analysis using
>> design <- model.matrix(~Genotype+Disease)
>>
>> but I would like to include also a continuous parameter ("Condition")
>> in the model because it seems
>> that differentially expressed genes in two different stages of the
>> disease e.g. "stageB" results in the fit , coming from the above
>> specified paired sample design and indicating stageB-stageA
>> differentially expressed genes)
>> somehow correlate with the "Condition"parameter
>>
>> At the model I could make it function using
>> design<- model.matrix(~Disease+r)
>> but not using
>> design <- model.matrix(~Genotype+r)
>> nor using
>> design <- model.matrix(~Genotype+Disease+r)
>>
>> I'm not sure on what design I should place to try and face the
>> question
>> and the simplest I could imagine:
>> design <- model.matrix(~Genotype+Disease+r)
>>
>> does not work
>>
>> I thank you very much for your supportive help
>>
>> Michela
>>
>>
>
>>> From smyth at wehi.edu.au Fri Apr 18 11:43:34 2014
>>> Date: Fri, 18 Apr 2014 11:43:28 +1000 (AUS Eastern Standard Time)
>>> From: Gordon K Smyth <smyth at wehi.edu.au>
>>> To: Riba Michela <riba.michela at gmail.com>
>>> Cc: Bioconductor mailing list <bioconductor at r-project.org>, James W.
> MacDonald <jmacdon at uw.edu>
>>> Subject: limma modeling, paired samples and continuous variable
>>>
>>>
>>>> Date: Thu, 17 Apr 2014 09:26:33 +0200
>>>> From: Riba Michela <riba.michela at gmail.com>
>>>> To: "James W. MacDonald" <jmacdon at uw.edu>
>>>> Cc: bioconductor at r-project.org
>>>> Subject: Re: [BioC] limma modeling, paired samples and continuous
>>>> variable
>>>>
>>>> Hi,
>>>> thanks a lot for your kind answer.
>>>> I have to specify an additional observation:
>>>> the "r"parameter is indeed a numeric variable and also in this
> situation
>>>> the result is the same.
>>>
>>> Actually it is not possible to get the same message as before if you
> have
>>> correctly code r as a numeric variable.
>>>
>>>> Would be reasonable to try and model the design as:
>>>> design<- <- model.matrix(~0+r)
>>>> #where "r"is a numeric variable?
>>>>
>>>> for the points about the coefficients I have to reason about
>>>
>>> No.
>>>
>>> To answer your question "if differential gene expression between two
> classes
>>> of disease are correlated with the r status", you probably need a
> Genotype:r
>>> iteraction term in your model.
>>>
>>> You probably need to show us the whole targets frame for us to help you
>>> further. In other words, we need to see:
>>>
>>> data.frame(Genotype,Disease,r)
>>>
>>> Best wishes
>>> Gordon
>>>
>>>> Thanks a lot
>>>>
>>>> Michela
>>>> Il giorno 15/apr/2014, alle ore 16:23, James W. MacDonald
> <jmacdon at uw.edu>
>>>> ha scritto:
>>>>
>>>>> Hi Michela,
>>>>>
>>>>> On 4/15/2014 5:05 AM, michela riba [guest] wrote:
>>>>>> Hi,
>>>>>> I'm sorry for re-posting the message, but I cannot find it in the
>>>>>> archive
>>>>>> Thanks a lot for attention
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> I would like to model and retrieve differential expression data
>>>>>> regarding an experimental design in which different patients (9)
> have
>>>>>> different disease classes (3 disease classes) and a feature
>>>>>> represented with a percentage (0, 0.50, 0.75,1).
>>>>>> some conditions are replicated 2 or 3 times, regarding the
> disease
>>>>>> condition
>>>>>> Till now I have done an analysis considering Genotype and Disease in
>>>>>> the model (as a paired samples analysis)
>>>>>>
>>>>>> design <- model.matrix(~Genotype+Disease)
>>>>>> or
>>>>>> design <- model.matrix(~0+Genotype+Disease)
>>>>>>
>>>>>> now I would like to model also considering
>>>>>> a continuous variable , namely r
>>>>>>
>>>>>> this way: design <- model.matrix(~Genotype+Disease+r)
>>>>>>
>>>>>> to see if differential gene expression between two classes of
> disease
>>>>>> are correlated with the r status
>>>>>>
>>>>>> but till now it is not possible to gain results
>>>>>> Coefficients not estimable: r0,5 r0,75 r1
>>>>>> Warning message:
>>>>>> Partial NA coefficients for 15246 probe(s)
>>>>>
>>>>> This indicates that R is using your 'r' data as factor rather than
>>>>> numeric. I assume that is not what you want? If so you need to ensure
>>>>> that R thinks that 'r' is a numeric vector.
>>>>>
>>>>> If you really are trying to treat 'r' as a factor, then note that you
>>>>> have either an over-specified model (meaning you are trying to
> estimate
>>>>> more parameters than you have observations), or that three of the
>>>>> coefficients for 'r' are linear combinations of existing coefficients
>>>>> when you already have genotype and disease in the model.
>>>>>
>>>>> Best,
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>>
>>>>>> if I model
>>>>>> design <- model.matrix(~Disease+r)
>>>>>> it goes well, but it would not consider the different genotypes
>>>>>>
>>>>>> I thank you very much for attention
>>>>>>
>>>>>> Thanks a lot
>>>>>>
>>>>>> Michela
>>>>>>
>>>>>> -- output of sessionInfo():
>>>>>>
>>>>>> R version 3.0.2 (2013-09-25)
>>>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] limma_3.18.13
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] tools_3.0.2
>>>>>>
>>>>> --
>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> University of Washington
>>>>> Environmental and Occupational Health Sciences
>>>>> 4225 Roosevelt Way NE, # 100
>>>>> Seattle WA 98105-6099
>>>>>
>>>>
>>>> Dr. Michela Riba
>>>> Genome Function Unit
>>>> Center for Translational Genomics and Bioinformatics
>>>> San Raffaele Scientific Institute
>>>> Via Olgettina 58
>>>> 20132 Milano
>>>> Italy
>>>>
>>>> lab: +39 02 2643 9114
>>>> skype: mic_mir32
>>>> riba.michela at gmail.com
>>>> riba.michela at hsr.it
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list