[BioC] limma modeling, paired samples

Mon Jun 9 17:42:57 CEST 2014

Hi Riba,

On 6/9/2014 11:01 AM, Riba Michela wrote:
> Hi,
> thanks for your quick answer,
> I get the point,
>
> and basically I actually arrived to the conclusion of being absorbed in
> the Intercept,
> for this reason I went on and put
> 0+ in the model,
>
> in any case sure I'm not a statistician, and I cannot move on from this.
>
> I'm not at the moment convinced about the meaning of DiseaseB even if
> actually following your indication is right what I 'm intrested in.
> The point is that biologically the first patient is not a baseline
> For this reson I would not consider a model in which both p[atient and
> disease are put together in the intercept.
> Disease stageA is a baseline for disease but the first patient is not as
> often in clinical settings happens.

And this is exactly why I suggest you consult with a local statistician. 
What you have done is perfectly acceptable, but you don't understand 
enough to realize that.

Baseline in this context has nothing to do with any biological meaning 
for the term. Instead, it simply means that all other groups are 
compared to the baseline. You can use relevel() to change the baseline 
at will.

And the parameterization you are using to account for pairs requires 
that one of the subjects be considered a baseline. This is algebraically 
identical to fitting a conventional paired t-test, and you will not be 
able to fit a model that accounts for pairs without absorbing one 
subject into a baseline.

Best,

Jim

>
> In this light could you suggest another formula to extract in paired
> sample way (somehow considering that each pateint has his own variability)
> the genes which significantly differ among all the 3 disease stages?
> eg. DiseasestageA vs B
> DiseasestageA vs C
> Disease stage B vs C?
>
> Due to high interpatient variability it is very difficult to obtain
> results in not paired sample, Disease only based modeling.
>
> I thank you very much for your patient and hope you would give me feedback
>
> Thanks a lot,
>
> Michela
> Il giorno 09/giu/2014, alle ore 16:07, "James W. MacDonald"
> <jmacdon at uw.edu <mailto:jmacdon at uw.edu>> ha scritto:
>
>> Hi Riba,
>>
>> On 6/9/2014 8:14 AM, Riba Michela wrote:
>>> Hi,
>>> I'm writing again dealing with a paired sample design:
>>> the experimental setting involves 9 patients, 3 disease stages and
>>> microarray expression data
>>> according to the included target file
>>>
>>>
>>>
>>>
>>> target<- readTargets("targetPT.txt")
>>> head(target)
>>>
>>>
>>> Genotype <- factor(target$Genotype)
>>> Disease<- factor(target$Disease, levels=c("stageA", "stageB", "stageC"))
>>>
>>>  I have performed a paired samples analysis using
>>> *design <- model.matrix(~Genotype+Disease)*
>>> *
>>> *
>>> in order to sort out genes differentially expressed between stages A and
>>> B for example
>>> but I noticed that the first patient and the first disease stage (in
>>> alphabetical order) disappears in the fit
>>> using
>>> colnames (fit)
>>
>> The patient and disease don't disappear; they are absorbed into the
>> intercept term. The model you are fitting is called a 'factor effects'
>> model, and all the coefficients are interpreted as differences between
>> a given sample type and the 'baseline', which in this case is the
>> Stage A disease for Genotype 1.
>>
>> In other words:
>>
>> > colnames(design)
>> [1] "(Intercept)"   "DiseasestageB" "DiseasestageC" "Genotypept02"
>> "Genotypept03"  "Genotypept04"  "Genotypept06"
>> [8] "Genotypept09"  "Genotypept10"  "Genotypept13"  "Genotypept14"
>>
>> DiseasestageB can be interpreted as Stage B - Stage A after
>> controlling for the paired nature of your data. The DiseasestageC
>> coefficient is interpreted analogously.
>>
>> This is a basic concept of linear modeling, and if you are getting
>> tripped up on the basics then I would highly recommend finding a local
>> statistician who can help you.
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> I tried to use
>>> *design <- model.matrix(~0+Genotype+Disease)*
>>> to explicit the coefficient in intercept
>>> and the first Disease type disappears
>>>
>>> I tried again
>>> *design <- model.matrix(~0+Disease+**Genotype**)*
>>> and again the first patient in alphabetical order disappears
>>>
>>> I do not have sufficient mathematical  education to understand exactly
>>> what shoud fit the needs
>>> I would prefer this last model formula to extract using a contrast
>>> matrix the differentially expressed genes between stages considering the
>>> variability due to different patients
>>> because it explicits all the disease stages,
>>> anyhow I would ask what could be the best way to address this problem
>>> and what could be the mistakes behind (i.e. I do not have all disease
>>> conditions for all the 9 patients,.. )
>>>
>>> I thank you very much for attention,
>>>
>>>
>>>
>>> Michela
>>>
>>>
>>> > sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] limma_3.18.13
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_3.0.2
>>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>
> Dr. Michela Riba
> Genome Function Unit
> Center for Translational Genomics and Bioinformatics
> San Raffaele Scientific Institute
> Via Olgettina 58
> 20132 Milano
> Italy
>
> lab: +39 02 2643 9114
> skype: mic_mir32
> riba.michela at gmail.com <mailto:riba.michela at gmail.com>
> riba.michela at hsr.it
>
>
> Se avete ricevuto il presente messaggio per errore, Vi preghiamo di
> darne immediata comunicazione al mittente e di provvedere alla sua
> cancellazione dal vostro computer. Grazie
>
> If you have received this e-mail in error, please let the sender know
> and delete it from your computer. Thank you
>
>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099