[BioC] problems with paired design in limma

Michael Walter michael.walter at med.uni-tuebingen.de
Wed Nov 26 16:38:51 CET 2008


Hi Jim,

I perfectly agree with you that I must not block the patients when I want to compare MSA vs Controls. For these comparisons I fitted a model without the patients and this worked fine. What we also want to see is the difference between the different regions in the different diseases, e.g. Cerebellum vs Cortex in the patients having MSA. Here I'd like to match the samples according to the donor. Can I alternatively try to fit three independent models for each disease instead of putting all together in one model?

Best Regards,

Mike


> Hi Mike,
> 
> Michael Walter wrote:
> > 
> > Dear List,
> > 
> > This one of the hundreds of "how do I create a design matrix in limma
> > question". However, I have difficulties in setting up a paired
> > design, with some error messages I really do not understand. The
> > experiment consists of 27 U133A arrays from 9 patients with 3
> > different conditions (2 diseases plus healthy controls). From each
> > patient we have 3 different brain regions. I want to compare the
> > difference between the brain regions in the different diseases.
> > therefore I want to match the samples from the individual patients. I
> > attached the code below. When I try to fit the model with lmFit I get
> > following error message:
> > 
> >> fit <- lmFit(data.norm, design)
> > Coefficients not estimable: sample_881 sample_936 Warning message: In
> > lmFit(data.norm, design) : Some coefficients not estimable:
> > coefficient interpretation may vary.
> > 
> > What I dont understand is why can I calculate the coefficients for
> > all but 2 samples? I allready doublechecked my target file and design
> > matrix and can't find any clue what might be wrong with these two
> > samples, so any hint is highly appreciated.
> 
> There is nothing wrong with these samples per se. The problem arises 
> from the fact that you are trying to compute estimates for too many 
> parameters, so lmFit() is informing you of this problem.
> 
> When you are fitting a linear model, in essence what you are doing is 
> solving equations for multiple unknown quantities. Algebraically you 
> need one equation (or set of data) per unknown quantity. So for 
> instance, you can solve for x with one equation, but you can't solve for 
> x and y with one equation, you need two.
> 
> However, you can solve for some combination of x and y with just one 
> equation:
> 
> x - y + 4 = 25 => x - y = 21
> 
> So what is happening is that one or more of your coefficients may be the 
> difference between two parameter estimates, rather than the estimate of 
> a single parameter. Which is what the 'coefficient interpretation may 
> vary' is hinting at.
> 
> I don't think you want to block these data on patient anyway. It seems 
> to me that you have patients with various diseases from whom you have 
> sampled brain tissue from various regions of the brain. So if you want 
> to e.g., compare the expression of genes in the cerebellum of people 
> with MSA to Co, then there is no blocking to be done because people 
> either have MSA or Co, but nobody has both.
> 
> Best,
> 
> Jim
> 
> > 
> > Best Regards,
> > 
> > Mike
> > 
> > 
> > 
> > Here is the code I used:
> > 
> >> target
> > File disease patient region 1 "Cbm 628 U133A.CEL" PD 628 Cerebellum 2
> > "Cbm 631 U133A.CEL" MSA 631 Cerebellum 3 "Cbm 650 U133A.CEL" PD 650
> > Cerebellum 4 "Cbm 755 U133A.CEL" PD 755 Cerebellum 5 "Cbm 758
> > U133A.CEL" Co 758 Cerebellum 6 "Cbm 769 U133A.CEL" MSA 769 Cerebellum
> >  7 "Cbm 776 U133A.CEL" MSA 776 Cerebellum 8 "Cbm 881 U133A.CEL" MSA
> > 881 Cerebellum 9 "Cbm 936 U133A.CEL" Co 936 Cerebellum 10
> > "E4R_042a12b.CEL" Co 936 Cortex 11 "I4R_012a1.CEL" PD 628 Cortex 12
> > "I4R_012a11.CEL" MSA 881 Cortex 13 "I4R_012a2.CEL" MSA 631 Cortex 14
> > "I4R_012a3.CEL" PD 650 Cortex 15 "I4R_012a6.CEL" PD 755 Cortex 16
> > "I4R_012a7.CEL" Co 758 Cortex 17 "I4R_012a8.CEL" MSA 769 Cortex 18
> > "I4R_012a9.CEL" MSA 776 Cortex 19 "pn0628_133a.CEL" PD 628 Putamen 20
> > "pn0631_133a.CEL" MSA 631 Putamen 21 "pn0650_133a.CEL" PD 650 Putamen
> >  22 "pn0755_133a.CEL" PD 755 Putamen 23 "pn0758_133a.CEL" Co 758
> > Putamen 24 "pn0769_133a.CEL" MSA 769 Putamen 25 "pn0776_133a.CEL" MSA
> > 776 Putamen 26 "pn0881_133a.CEL" MSA 881 Putamen 27 "pn0936_133a.CEL"
> > Co 936 Putamen
> > 
> >> condition <- as.factor(paste(disease, rep(c("Cbm", "Cor", "Ptm"),
> >> each=9), sep=".")) sample <- as.factor(paste("_", patient, sep=""))
> >> 
> >> 
> >> design <- model.matrix(~0+condition+sample) colnames(design)[1:9]
> >> <- sort(as.character(unique(condition))) fit <- lmFit(data.norm,
> >> design)
> > Coefficients not estimable: sample_881 sample_936 Warning message: In
> > lmFit(data.norm, design) : Some coefficients not estimable:
> > coefficient interpretation may vary.
> >> sessionInfo()
> > R version 2.7.0 (2008-04-22) i386-pc-mingw32
> > 
> > locale: 
> > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
> > 
> > 
> > attached base packages: [1] tools stats graphics grDevices utils
> > datasets methods [8] base
> > 
> > other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0
> > affyio_1.8.0 [4] Biobase_2.0.1 limma_2.14.5
> > 
> > loaded via a namespace (and not attached): [1] scatterplot3d_0.3-27
> > 
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > 
> > _______________________________________________ Bioconductor mailing
> > list Bioconductor at stat.math.ethz.ch 
> > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> > archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-0646
> 734-936-8662
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Dr. Michael Walter

The Microarray Facility
University of Tuebingen
Calwerstr. 7
72076  Tübingen/GERMANY

Tel.: +49 (0) 7071 29 83210
Fax. + 49 (0) 7071 29 5228

Confidentiality Note:
This message is intended only for the use of the named recipient(s) and may
contain confidential and/or proprietary information. If you are not the intended
recipient, please contact the sender and delete the message. Any unauthorized
use of the information contained in this message is prohibited



More information about the Bioconductor mailing list