[BioC] problems with paired design in limma

Wed Nov 26 15:38:31 CET 2008

Hi Mike,

Michael Walter wrote:
> 
> Dear List,
> 
> This one of the hundreds of "how do I create a design matrix in limma
> question". However, I have difficulties in setting up a paired
> design, with some error messages I really do not understand. The
> experiment consists of 27 U133A arrays from 9 patients with 3
> different conditions (2 diseases plus healthy controls). From each
> patient we have 3 different brain regions. I want to compare the
> difference between the brain regions in the different diseases.
> therefore I want to match the samples from the individual patients. I
> attached the code below. When I try to fit the model with lmFit I get
> following error message:
> 
>> fit <- lmFit(data.norm, design)
> Coefficients not estimable: sample_881 sample_936 Warning message: In
> lmFit(data.norm, design) : Some coefficients not estimable:
> coefficient interpretation may vary.
> 
> What I dont understand is why can I calculate the coefficients for
> all but 2 samples? I allready doublechecked my target file and design
> matrix and can't find any clue what might be wrong with these two
> samples, so any hint is highly appreciated.

There is nothing wrong with these samples per se. The problem arises 
from the fact that you are trying to compute estimates for too many 
parameters, so lmFit() is informing you of this problem.

When you are fitting a linear model, in essence what you are doing is 
solving equations for multiple unknown quantities. Algebraically you 
need one equation (or set of data) per unknown quantity. So for 
instance, you can solve for x with one equation, but you can't solve for 
x and y with one equation, you need two.

However, you can solve for some combination of x and y with just one 
equation:

x - y + 4 = 25 => x - y = 21

So what is happening is that one or more of your coefficients may be the 
difference between two parameter estimates, rather than the estimate of 
a single parameter. Which is what the 'coefficient interpretation may 
vary' is hinting at.

I don't think you want to block these data on patient anyway. It seems 
to me that you have patients with various diseases from whom you have 
sampled brain tissue from various regions of the brain. So if you want 
to e.g., compare the expression of genes in the cerebellum of people 
with MSA to Co, then there is no blocking to be done because people 
either have MSA or Co, but nobody has both.

Best,

Jim

> 
> Best Regards,
> 
> Mike
> 
> 
> 
> Here is the code I used:
> 
>> target
> File disease patient region 1 "Cbm 628 U133A.CEL" PD 628 Cerebellum 2
> "Cbm 631 U133A.CEL" MSA 631 Cerebellum 3 "Cbm 650 U133A.CEL" PD 650
> Cerebellum 4 "Cbm 755 U133A.CEL" PD 755 Cerebellum 5 "Cbm 758
> U133A.CEL" Co 758 Cerebellum 6 "Cbm 769 U133A.CEL" MSA 769 Cerebellum
>  7 "Cbm 776 U133A.CEL" MSA 776 Cerebellum 8 "Cbm 881 U133A.CEL" MSA
> 881 Cerebellum 9 "Cbm 936 U133A.CEL" Co 936 Cerebellum 10
> "E4R_042a12b.CEL" Co 936 Cortex 11 "I4R_012a1.CEL" PD 628 Cortex 12
> "I4R_012a11.CEL" MSA 881 Cortex 13 "I4R_012a2.CEL" MSA 631 Cortex 14
> "I4R_012a3.CEL" PD 650 Cortex 15 "I4R_012a6.CEL" PD 755 Cortex 16
> "I4R_012a7.CEL" Co 758 Cortex 17 "I4R_012a8.CEL" MSA 769 Cortex 18
> "I4R_012a9.CEL" MSA 776 Cortex 19 "pn0628_133a.CEL" PD 628 Putamen 20
> "pn0631_133a.CEL" MSA 631 Putamen 21 "pn0650_133a.CEL" PD 650 Putamen
>  22 "pn0755_133a.CEL" PD 755 Putamen 23 "pn0758_133a.CEL" Co 758
> Putamen 24 "pn0769_133a.CEL" MSA 769 Putamen 25 "pn0776_133a.CEL" MSA
> 776 Putamen 26 "pn0881_133a.CEL" MSA 881 Putamen 27 "pn0936_133a.CEL"
> Co 936 Putamen
> 
>> condition <- as.factor(paste(disease, rep(c("Cbm", "Cor", "Ptm"),
>> each=9), sep=".")) sample <- as.factor(paste("_", patient, sep=""))
>> 
>> 
>> design <- model.matrix(~0+condition+sample) colnames(design)[1:9]
>> <- sort(as.character(unique(condition))) fit <- lmFit(data.norm,
>> design)
> Coefficients not estimable: sample_881 sample_936 Warning message: In
> lmFit(data.norm, design) : Some coefficients not estimable:
> coefficient interpretation may vary.
>> sessionInfo()
> R version 2.7.0 (2008-04-22) i386-pc-mingw32
> 
> locale: 
> LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
> 
> 
> attached base packages: [1] tools stats graphics grDevices utils
> datasets methods [8] base
> 
> other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0
> affyio_1.8.0 [4] Biobase_2.0.1 limma_2.14.5
> 
> loaded via a namespace (and not attached): [1] scatterplot3d_0.3-27
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662