[BioC] limma modeling, paired samples and continuous variable

Gordon K Smyth smyth at wehi.EDU.AU
Sat Apr 19 02:11:59 CEST 2014


Sorry, I meant "Dear Michela".

Gordon

On Sat, 19 Apr 2014, Gordon K Smyth wrote:

> Dear Riba,
>
> Well, a couple of points.
>
> First, if you want to treat Condition as numeric, then you must not declare 
> it to be a factor.  In R, a "factor" is a variable that is categorical 
> instead of numeric.
>
> Second, Condition is entirely confounded with Genotype in your experiment. 
> Samples for the same Genotype always have the same Condition. For example, 
> all samples of Genotype pt01 have Condition=0, all samples of Genotype pt06 
> have Condition=0.5, and so on.  Hence you cannot include Condition and 
> Genotype in the same model, because they are giving the same information.  If 
> you adjust for Genotype in the model, then you have necessarily also adjusted 
> for the Condition.
>
> You need:
>
>  target<- readTargets("targetPTpGSp.txt")
>  Genotype <- factor(target$Genotype)
>  Disease<- factor(target$Disease)
>  Condition <- target$Condition
>
> Then can use either:
>
>  design <- model.matrix(~Genotype+Disease)
>
> or
>
>  design <- mode.matrix(~Condition+Disease)
>
> Best wishes
> Gordon
>
>> Date: Fri, 18 Apr 2014 11:19:56 +0200
>> From: Riba Michela <riba.michela at gmail.com>
>> To: Gordon K Smyth <smyth at wehi.edu.au>
>> Cc: Bioconductor mailing list <bioconductor at r-project.org>, James W. 
> MacDonald <jmacdon at uw.edu>
>> Subject: Re: limma modeling, paired samples and continuous variable
>> 
>> Hi,
>> thanks a lot for your answer and I'm forwarding the covariate matrix
>> of our design.
>> 
>> target<- readTargets("targetPTpGSp.txt")
>> head(target)
>> 
>> Genotype <- factor(target$Genotype)
>> Disease<- factor(target$Disease, levels=c("stageA", "stageB",
>> "stageC"))
>> 
>> # Condition <-factor(target$Condition)
>> r<-target$Condition #this should be numeric
>> 
>> I'm just recalling the most striking parts of what I ideally would
>> try to do and what I have already did.
>> 
>> Till now I have performed a paired samples analysis using
>> design <- model.matrix(~Genotype+Disease)
>> 
>> but I would like to include also a continuous parameter ("Condition")
>> in the model because it seems
>> that differentially expressed genes in two different stages of the
>> disease e.g. "stageB" results in the fit , coming from the above
>> specified paired sample design and indicating stageB-stageA
>> differentially expressed genes)
>> somehow correlate with the "Condition"parameter
>> 
>> At the model I could make it function using
>> design<- model.matrix(~Disease+r)
>> but not using
>> design <- model.matrix(~Genotype+r)
>> nor using
>> design <- model.matrix(~Genotype+Disease+r)
>> 
>> I'm not sure on what design I should place to try and face the
>> question
>> and the simplest I could imagine:
>> design <- model.matrix(~Genotype+Disease+r)
>> 
>> does not work
>> 
>> I thank you very much for your supportive help
>> 
>> Michela
>> 
>> 
>
>>> From smyth at wehi.edu.au Fri Apr 18 11:43:34 2014
>>> Date: Fri, 18 Apr 2014 11:43:28 +1000 (AUS Eastern Standard Time)
>>> From: Gordon K Smyth <smyth at wehi.edu.au>
>>> To: Riba Michela <riba.michela at gmail.com>
>>> Cc: Bioconductor mailing list <bioconductor at r-project.org>, James W. 
> MacDonald <jmacdon at uw.edu>
>>> Subject: limma modeling, paired samples and continuous variable
>>> 
>>> 
>>>> Date: Thu, 17 Apr 2014 09:26:33 +0200
>>>> From: Riba Michela <riba.michela at gmail.com>
>>>> To: "James W. MacDonald" <jmacdon at uw.edu>
>>>> Cc: bioconductor at r-project.org
>>>> Subject: Re: [BioC] limma modeling, paired samples and continuous
>>>> 	variable
>>>> 
>>>> Hi,
>>>> thanks a lot for your kind answer.
>>>> I have to specify an additional observation:
>>>> the "r"parameter is indeed a numeric variable and also in this 
> situation
>>>> the result is the same.
>>> 
>>> Actually it is not possible to get the same message as before if you 
> have
>>> correctly code r as a numeric variable.
>>> 
>>>> Would be reasonable to try and model the design as:
>>>> design<- <- model.matrix(~0+r)
>>>> #where "r"is a numeric variable?
>>>> 
>>>> for the points about the coefficients I have to reason about
>>> 
>>> No.
>>> 
>>> To answer your question "if differential gene expression between two 
> classes
>>> of disease are correlated with the r status", you probably need a 
> Genotype:r
>>> iteraction term in your model.
>>> 
>>> You probably need to show us the whole targets frame for us to help you
>>> further.  In other words, we need to see:
>>>
>>>  data.frame(Genotype,Disease,r)
>>> 
>>> Best wishes
>>> Gordon
>>> 
>>>> Thanks a lot
>>>> 
>>>> Michela
>>>> Il giorno 15/apr/2014, alle ore 16:23, James W. MacDonald 
> <jmacdon at uw.edu>
>>>> ha scritto:
>>>> 
>>>>> Hi Michela,
>>>>> 
>>>>> On 4/15/2014 5:05 AM, michela riba [guest] wrote:
>>>>>> Hi,
>>>>>> I'm sorry for re-posting the message, but I cannot find it in the
>>>>>> archive
>>>>>> Thanks a lot for attention
>>>>>> 
>>>>>> 
>>>>>> Hi,
>>>>>> I would like to model and retrieve differential expression data
>>>>>> regarding an experimental design in which different patients (9) 
> have
>>>>>> different disease classes (3 disease classes) and a feature
>>>>>> represented with a percentage (0, 0.50, 0.75,1).
>>>>>>     some conditions are replicated 2 or 3 times, regarding the 
> disease
>>>>>> condition
>>>>>> Till now I have done an analysis considering Genotype and Disease in
>>>>>> the model (as a paired samples analysis)
>>>>>> 
>>>>>> design <- model.matrix(~Genotype+Disease)
>>>>>> or
>>>>>> design <- model.matrix(~0+Genotype+Disease)
>>>>>> 
>>>>>> now I would like to model also considering
>>>>>> a continuous variable , namely r
>>>>>> 
>>>>>> this way: design <- model.matrix(~Genotype+Disease+r)
>>>>>> 
>>>>>> to see if differential gene expression between two classes of 
> disease
>>>>>> are correlated with the r status
>>>>>> 
>>>>>> but till now it is not possible to gain results
>>>>>> Coefficients not estimable: r0,5 r0,75 r1
>>>>>> Warning message:
>>>>>> Partial NA coefficients for 15246 probe(s)
>>>>> 
>>>>> This indicates that R is using your 'r' data as factor rather than
>>>>> numeric. I assume that is not what you want? If so you need to ensure
>>>>> that R thinks that 'r' is a numeric vector.
>>>>> 
>>>>> If you really are trying to treat 'r' as a factor, then note that you
>>>>> have either an over-specified model (meaning you are trying to 
> estimate
>>>>> more parameters than you have observations), or that three of the
>>>>> coefficients for 'r' are linear combinations of existing coefficients
>>>>> when you already have genotype and disease in the model.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jim
>>>>> 
>>>>> 
>>>>>> 
>>>>>> if I model
>>>>>> design <- model.matrix(~Disease+r)
>>>>>> it goes well, but  it would not consider the different genotypes
>>>>>> 
>>>>>> I thank you very much for attention
>>>>>> 
>>>>>> Thanks a lot
>>>>>> 
>>>>>> Michela
>>>>>>
>>>>>>  -- output of sessionInfo():
>>>>>> 
>>>>>> R version 3.0.2 (2013-09-25)
>>>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>>> 
>>>>>> locale:
>>>>>> [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>>>>>> 
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>> 
>>>>>> other attached packages:
>>>>>> [1] limma_3.18.13
>>>>>> 
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] tools_3.0.2
>>>>>> 
>>>>> --
>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> University of Washington
>>>>> Environmental and Occupational Health Sciences
>>>>> 4225 Roosevelt Way NE, # 100
>>>>> Seattle WA 98105-6099
>>>>> 
>>>> 
>>>> Dr. Michela Riba
>>>> Genome Function Unit
>>>> Center for Translational Genomics and Bioinformatics
>>>> San Raffaele Scientific Institute
>>>> Via Olgettina 58
>>>> 20132 Milano
>>>> Italy
>>>> 
>>>> lab: +39 02 2643 9114
>>>> skype: mic_mir32
>>>> riba.michela at gmail.com
>>>> riba.michela at hsr.it
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list