[BioC] limma modeling, paired samples and continuous variable

Gordon K Smyth smyth at wehi.EDU.AU
Sat Apr 19 01:53:33 CEST 2014

Dear Riba,

Well, a couple of points.

First, if you want to treat Condition as numeric, then you must not 
declare it to be a factor.  In R, a "factor" is a variable that is 
categorical instead of numeric.

Second, Condition is entirely confounded with Genotype in your experiment. 
Samples for the same Genotype always have the same Condition. For example, 
all samples of Genotype pt01 have Condition=0, all samples of Genotype 
pt06 have Condition=0.5, and so on.  Hence you cannot include Condition 
and Genotype in the same model, because they are giving the same 
information.  If you adjust for Genotype in the model, then you have 
necessarily also adjusted for the Condition.

You need:

   target<- readTargets("targetPTpGSp.txt")
   Genotype <- factor(target$Genotype)
   Disease<- factor(target$Disease)
   Condition <- target$Condition

Then can use either:

   design <- model.matrix(~Genotype+Disease)


   design <- mode.matrix(~Condition+Disease)

Best wishes

>Date: Fri, 18 Apr 2014 11:19:56 +0200
>From: Riba Michela <riba.michela at gmail.com>
>To: Gordon K Smyth <smyth at wehi.edu.au>
>Cc: Bioconductor mailing list <bioconductor at r-project.org>, James W. 
MacDonald <jmacdon at uw.edu>
>Subject: Re: limma modeling, paired samples and continuous variable
>thanks a lot for your answer and I'm forwarding the covariate matrix
>of our design.
>target<- readTargets("targetPTpGSp.txt")
>Genotype <- factor(target$Genotype)
>Disease<- factor(target$Disease, levels=c("stageA", "stageB",
># Condition <-factor(target$Condition)
>r<-target$Condition #this should be numeric
>I'm just recalling the most striking parts of what I ideally would
>try to do and what I have already did.
>Till now I have performed a paired samples analysis using
>design <- model.matrix(~Genotype+Disease)
>but I would like to include also a continuous parameter ("Condition")
>in the model because it seems
>that differentially expressed genes in two different stages of the
>disease e.g. "stageB" results in the fit , coming from the above
>specified paired sample design and indicating stageB-stageA
>differentially expressed genes)
>somehow correlate with the "Condition"parameter
>At the model I could make it function using
>design<- model.matrix(~Disease+r)
>but not using
>design <- model.matrix(~Genotype+r)
>nor using
>design <- model.matrix(~Genotype+Disease+r)
>I'm not sure on what design I should place to try and face the
>and the simplest I could imagine:
>design <- model.matrix(~Genotype+Disease+r)
>does not work
>I thank you very much for your supportive help

>>From smyth at wehi.edu.au Fri Apr 18 11:43:34 2014
>>Date: Fri, 18 Apr 2014 11:43:28 +1000 (AUS Eastern Standard Time)
>>From: Gordon K Smyth <smyth at wehi.edu.au>
>>To: Riba Michela <riba.michela at gmail.com>
>>Cc: Bioconductor mailing list <bioconductor at r-project.org>, James W. 
MacDonald <jmacdon at uw.edu>
>>Subject: limma modeling, paired samples and continuous variable
>>> Date: Thu, 17 Apr 2014 09:26:33 +0200
>>> From: Riba Michela <riba.michela at gmail.com>
>>> To: "James W. MacDonald" <jmacdon at uw.edu>
>>> Cc: bioconductor at r-project.org
>>> Subject: Re: [BioC] limma modeling, paired samples and continuous
>>> 	variable
>>> Hi,
>>> thanks a lot for your kind answer.
>>> I have to specify an additional observation:
>>> the "r"parameter is indeed a numeric variable and also in this 
>>> the result is the same.
>>Actually it is not possible to get the same message as before if you 
>>correctly code r as a numeric variable.
>>> Would be reasonable to try and model the design as:
>>> design<- <- model.matrix(~0+r)
>>> #where "r"is a numeric variable?
>>> for the points about the coefficients I have to reason about
>>To answer your question "if differential gene expression between two 
>>of disease are correlated with the r status", you probably need a 
>>iteraction term in your model.
>>You probably need to show us the whole targets frame for us to help you
>>further.  In other words, we need to see:
>>  data.frame(Genotype,Disease,r)
>>Best wishes
>>> Thanks a lot
>>> Michela
>>> Il giorno 15/apr/2014, alle ore 16:23, James W. MacDonald 
<jmacdon at uw.edu>
>>> ha scritto:
>>>> Hi Michela,
>>>> On 4/15/2014 5:05 AM, michela riba [guest] wrote:
>>>>> Hi,
>>>>> I'm sorry for re-posting the message, but I cannot find it in the
>>>>> archive
>>>>> Thanks a lot for attention
>>>>> Hi,
>>>>> I would like to model and retrieve differential expression data
>>>>> regarding an experimental design in which different patients (9) 
>>>>> different disease classes (3 disease classes) and a feature
>>>>> represented with a percentage (0, 0.50, 0.75,1).
>>>>>     some conditions are replicated 2 or 3 times, regarding the 
>>>>> condition
>>>>> Till now I have done an analysis considering Genotype and Disease in
>>>>> the model (as a paired samples analysis)
>>>>> design <- model.matrix(~Genotype+Disease)
>>>>> or
>>>>> design <- model.matrix(~0+Genotype+Disease)
>>>>> now I would like to model also considering
>>>>> a continuous variable , namely r
>>>>> this way: design <- model.matrix(~Genotype+Disease+r)
>>>>> to see if differential gene expression between two classes of 
>>>>> are correlated with the r status
>>>>> but till now it is not possible to gain results
>>>>> Coefficients not estimable: r0,5 r0,75 r1
>>>>> Warning message:
>>>>> Partial NA coefficients for 15246 probe(s)
>>>> This indicates that R is using your 'r' data as factor rather than
>>>> numeric. I assume that is not what you want? If so you need to ensure
>>>> that R thinks that 'r' is a numeric vector.
>>>> If you really are trying to treat 'r' as a factor, then note that you
>>>> have either an over-specified model (meaning you are trying to 
>>>> more parameters than you have observations), or that three of the
>>>> coefficients for 'r' are linear combinations of existing coefficients
>>>> when you already have genotype and disease in the model.
>>>> Best,
>>>> Jim
>>>>> if I model
>>>>> design <- model.matrix(~Disease+r)
>>>>> it goes well, but  it would not consider the different genotypes
>>>>> I thank you very much for attention
>>>>> Thanks a lot
>>>>> Michela
>>>>>  -- output of sessionInfo():
>>>>> R version 3.0.2 (2013-09-25)
>>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>> locale:
>>>>> [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>> other attached packages:
>>>>> [1] limma_3.18.13
>>>>> loaded via a namespace (and not attached):
>>>>> [1] tools_3.0.2
>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> University of Washington
>>>> Environmental and Occupational Health Sciences
>>>> 4225 Roosevelt Way NE, # 100
>>>> Seattle WA 98105-6099
>>> Dr. Michela Riba
>>> Genome Function Unit
>>> Center for Translational Genomics and Bioinformatics
>>> San Raffaele Scientific Institute
>>> Via Olgettina 58
>>> 20132 Milano
>>> Italy
>>> lab: +39 02 2643 9114
>>> skype: mic_mir32
>>> riba.michela at gmail.com
>>> riba.michela at hsr.it

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list