[BioC] limma modeling, paired samples
bernd.klaus at embl.de
Mon Jun 9 17:34:10 CEST 2014
On Jun 9, 2014, at 5:01 PM, Riba Michela <riba.michela at gmail.com> wrote:
> thanks for your quick answer,
> I get the point,
> and basically I actually arrived to the conclusion of being absorbed in the Intercept,
> for this reason I went on and put
> 0+ in the model,
> in any case sure I'm not a statistician, and I cannot move on from this.
> I'm not at the moment convinced about the meaning of DiseaseB even if actually following your indication is right what I 'm intrested in.
> The point is that biologically the first patient is not a baseline
> For this reson I would not consider a model in which both p[atient and disease are put together in the intercept.
> Disease stageA is a baseline for disease but the first patient is not as often in clinical settings happens.
> In this light could you suggest another formula to extract in paired sample way (somehow considering that each pateint has his own variability)
> the genes which significantly differ among all the 3 disease stages?
> eg. DiseasestageA vs B
> DiseasestageA vs C
> Disease stage B vs C?
Well, as Jim said the coefficient "DiseasestageB" is B-A, accordingly "DiseasestageC" is C-A.
To get B-C you have to extract the contrast "DiseasestageB - DiseasestageC"
which is B-A -(C-A) = B-C.
In this factor model you assume that the effects are additive, so "contrasting" two coefficients
that are relative to the same genotype base level gives you the difference in mean expression
explained by disease independent of genotype.
I sent you in another mail some Teaching material of mine that explains this in more
detail. (I will put this on github soon)
> Due to high interpatient variability it is very difficult to obtain results in not paired sample, Disease only based modeling.
> I thank you very much for your patient and hope you would give me feedback
> Thanks a lot,
> Il giorno 09/giu/2014, alle ore 16:07, "James W. MacDonald" <jmacdon at uw.edu> ha scritto:
>> Hi Riba,
>> On 6/9/2014 8:14 AM, Riba Michela wrote:
>>> I'm writing again dealing with a paired sample design:
>>> the experimental setting involves 9 patients, 3 disease stages and
>>> microarray expression data
>>> according to the included target file
>>> target<- readTargets("targetPT.txt")
>>> Genotype <- factor(target$Genotype)
>>> Disease<- factor(target$Disease, levels=c("stageA", "stageB", "stageC"))
>>> I have performed a paired samples analysis using
>>> *design <- model.matrix(~Genotype+Disease)*
>>> in order to sort out genes differentially expressed between stages A and
>>> B for example
>>> but I noticed that the first patient and the first disease stage (in
>>> alphabetical order) disappears in the fit
>>> colnames (fit)
>> The patient and disease don't disappear; they are absorbed into the intercept term. The model you are fitting is called a 'factor effects' model, and all the coefficients are interpreted as differences between a given sample type and the 'baseline', which in this case is the Stage A disease for Genotype 1.
>> In other words:
>>  "(Intercept)" "DiseasestageB" "DiseasestageC" "Genotypept02" "Genotypept03" "Genotypept04" "Genotypept06"
>>  "Genotypept09" "Genotypept10" "Genotypept13" "Genotypept14"
>> DiseasestageB can be interpreted as Stage B - Stage A after controlling for the paired nature of your data. The DiseasestageC coefficient is interpreted analogously.
>> This is a basic concept of linear modeling, and if you are getting tripped up on the basics then I would highly recommend finding a local statistician who can help you.
>>> I tried to use
>>> *design <- model.matrix(~0+Genotype+Disease)*
>>> to explicit the coefficient in intercept
>>> and the first Disease type disappears
>>> I tried again
>>> *design <- model.matrix(~0+Disease+**Genotype**)*
>>> and again the first patient in alphabetical order disappears
>>> I do not have sufficient mathematical education to understand exactly
>>> what shoud fit the needs
>>> I would prefer this last model formula to extract using a contrast
>>> matrix the differentially expressed genes between stages considering the
>>> variability due to different patients
>>> because it explicits all the disease stages,
>>> anyhow I would ask what could be the best way to address this problem
>>> and what could be the mistakes behind (i.e. I do not have all disease
>>> conditions for all the 9 patients,.. )
>>> I thank you very much for attention,
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>  it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>>> attached base packages:
>>>  stats graphics grDevices utils datasets methods base
>>> other attached packages:
>>>  limma_3.18.13
>>> loaded via a namespace (and not attached):
>>>  tools_3.0.2
>> James W. MacDonald, M.S.
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
> Dr. Michela Riba
> Genome Function Unit
> Center for Translational Genomics and Bioinformatics
> San Raffaele Scientific Institute
> Via Olgettina 58
> 20132 Milano
> lab: +39 02 2643 9114
> skype: mic_mir32
> riba.michela at gmail.com
> riba.michela at hsr.it
> Se avete ricevuto il presente messaggio per errore, Vi preghiamo di darne immediata comunicazione al mittente e di provvedere alla sua cancellazione dal vostro computer. Grazie
> If you have received this e-mail in error, please let the sender know and delete it from your computer. Thank you
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor