[BioC] Problem fitting a linear model using limma

James W. MacDonald jmacdon at uw.edu
Wed Mar 7 18:01:06 CET 2012


Hi Joaquin,

On 3/7/2012 11:08 AM, Joaquin Martinez wrote:
> Dear All,
>
> My colleague Ben and I have fitted a linear model to microarray data using
> the limma package but we get "coefficients not estimable" for the last two
> columns in the design matrix during the fit. As a consequence when trying
> to fit our contrast matrix we get the following error message, "Error in
> contrasts.fit(fit, contrasts = A) : trying to take contrast of
> non-estimable coefficient."  We have discovered that if we simply reorder
> the design matrix columns and contrasts matrix rows, we encounter the same
> error. We would very much appreciate if someone could explain to us why the
> not-estimable coefficients arise, and how to correct the problem.
>
> We are following the guide (section 8.2) for limma 3.10.0 using this
> version of R 2.14.0 (sessionInfo at end of email).  We have 6 different
> types of samples (Puninf, P86, P163, Muninf, M86, M163), and 3 biological
> replicates of each (1,2,3; 4,5,6; 7,8,9; 10,11,12; 13,14,15; 16,17,18).

Yes, but your targets file has numbers appended to the sample types, 
which makes R think they are different. Hypothetically you would have 
noticed this when using the modelMatrix() function:

 > modelMatrix(targets, ref = "Puninf.1")
Found unique target names:
  M163.16 M163.17 M163.18 M86.13 M86.14 M86.15 Muninf.10 Muninf.11 
Muninf.12 P163.7 P163.8 P163.9 P86.4 P86.5 P86.6 Puninf.1 Puninf.2 Puninf.3

This indicates that limma thinks all of your replicates are different 
sample types. This won't work out (as you have already found) because 
the model will be over specified, which simply means that you cannot 
estimate all the parameters you have in the model. This is the same as 
saying you can't determine the values of x and y in the formula x + y  = 
3 (but if you also know that x - y = -1, then you can solve for both x 
and y).

Now back to the problem at hand. If you remove the numbers from your 
sample types,

targets$Cy3 <- sapply(strsplit(targets$Cy3, "\\."), function(x) x[1])
targets$Cy5 <- sapply(strsplit(targets$Cy5, "\\."), function(x) x[1])

then you will create the correct model matrix

 > modelMatrix(targets, ref="Puninf")
Found unique target names:
  M163 M86 Muninf P163 P86 Puninf
       M163 M86 Muninf P163 P86
  [1,]    0   0      1    0   0
  [2,]    0   0      0    0   1
  [3,]    0   0      0   -1   0
  [4,]    0  -1      1    0   0
  [5,]    1   0     -1    0   0
  [6,]    0   1      0    0  -1
  [7,]   -1   0      0    1   0
  [8,]    0   0     -1    0   0
  [9,]    0   0      0    0  -1
[10,]    0   0      0    1   0
[11,]    0   1     -1    0   0
[12,]   -1   0      1    0   0
[13,]    0  -1      0    0   1
[14,]    1   0      0   -1   0
[15,]    0   0      1    0   0
[16,]    0   0      0    0   1
[17,]    0   0      0   -1   0
[18,]    0  -1      1    0   0
[19,]    1   0     -1    0   0
[20,]    0   1      0    0  -1
[21,]   -1   0      0    1   0

Which will compare all samples to Puninf.

Best,

Jim





>
> # The targets file looks like this
>
> targets<- structure(list(SlideNumber = c(29L, 30L, 31L, 32L, 33L, 34L,
>     35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 51L, 46L, 47L,
>     48L, 50L), FileName = c("BE T22h slide 29", "BE T22h slide 30",
>     "BE T22h slide 31", "BE T22h slide 32", "BE T22h slide 33", "BE T22h
> slide 34",
>     "BE T22h slide 35", "BE T22h slide 36", "BE T22h slide 37", "BE T22h
> slide 38",
>     "BE T22h slide 39", "BE T22h slide 40", "BE T22h slide 41", "BE T22h
> slide 42",
>     "BE T22h slide 43", "BE T22h slide 44", "BE T22h slide 51", "BE T22h
> slide 46",
>     "BE T22h slide 47", "BE T22h slide 48", "BE T22h slide 50"),
>      Cy3 = c("Puninf.1", "Puninf.1", "P163.7", "M86.13", "Muninf.10",
>      "P86.4", "M163.16", "Muninf.11", "P86.5", "Puninf.2", "Muninf.11",
>      "M163.17", "M86.14", "P163.8", "Puninf.3", "Puninf.3", "P163.9",
>      "M86.15", "Muninf.12", "P86.6", "M163.18"), Cy5 = c("Muninf.10",
>      "P86.4", "Puninf.1", "Muninf.10", "M163.16", "M86.13", "P163.7",
>      "Puninf.2", "Puninf.2", "P163.8", "M86.14", "Muninf.11",
>      "P86.5", "M163.17", "Muninf.12", "P86.6", "Puninf.3", "Muninf.12",
>      "M163.18", "M86.15", "P163.9")), .Names = c("SlideNumber",
>     "FileName", "Cy3", "Cy5"), class = "data.frame", row.names = c(NA,
>     -21L))
>
> # for readability we also paste it in here...
> #>  targets
> #     SlideNumber         FileName       Cy3       Cy5
> #  1           29 BE T22h slide 29  Puninf.1 Muninf.10
> #  2           30 BE T22h slide 30  Puninf.1     P86.4
> #  3           31 BE T22h slide 31    P163.7  Puninf.1
> #  4           32 BE T22h slide 32    M86.13 Muninf.10
> #  5           33 BE T22h slide 33 Muninf.10   M163.16
> #  6           34 BE T22h slide 34     P86.4    M86.13
> #  7           35 BE T22h slide 35   M163.16    P163.7
> #  8           36 BE T22h slide 36 Muninf.11  Puninf.2
> #  9           37 BE T22h slide 37     P86.5  Puninf.2
> #  10          38 BE T22h slide 38  Puninf.2    P163.8
> #  11          39 BE T22h slide 39 Muninf.11    M86.14
> #  12          40 BE T22h slide 40   M163.17 Muninf.11
> #  13          41 BE T22h slide 41    M86.14     P86.5
> #  14          42 BE T22h slide 42    P163.8   M163.17
> #  15          43 BE T22h slide 43  Puninf.3 Muninf.12
> #  16          44 BE T22h slide 44  Puninf.3     P86.6
> #  17          51 BE T22h slide 51    P163.9  Puninf.3
> #  18          46 BE T22h slide 46    M86.15 Muninf.12
> #  19          47 BE T22h slide 47 Muninf.12   M163.18
> #  20          48 BE T22h slide 48     P86.6    M86.15
> #  21          50 BE T22h slide 50   M163.18    P163.9
>
> # The design matrix is computed like this design<- modelMatrix(targets,
> ref = "Puninf.1")
> # but here is the structure if it is helpful
> design<- structure(c(0, 0, 0, 0, 1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 1, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 1, 0, -1, 0, 0, 0, -1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, -1,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 0, -1, 0, 1, 0, 1, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1,
>     1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 1, 0, 0, 1, -1, 0, 0, 0, 0, -1, 0, 0, 0, 1, 0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
>     0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 1, 0, 1, 0, 0, 0, -1, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     -1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0,
>     0, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 1, 0, 0, 0, 0), .Dim = c(21L,
>     17L), .Dimnames = list(c("slide 29", "slide 30", "slide 31",
>     "slide 32", "slide 33", "slide 34", "slide 35", "slide 36", "slide 37",
>     "slide 38", "slide 39", "slide 40", "slide 41", "slide 42", "slide 43",
>     "slide 44", "slide 51", "slide 46", "slide 47", "slide 48", "slide 50"
>     ), c("M163.16", "M163.17", "M163.18", "M86.13", "M86.14", "M86.15",
>     "Muninf.10", "Muninf.11", "Muninf.12", "P163.7", "P163.8", "P163.9",
>     "P86.4", "P86.5", "P86.6", "Puninf.2", "Puninf.3")))
>
>
> # Fit linear model - commented out becuase we can't really send all the
> data in MA
> # fit<- lmFit(MA, design = design)
> # Coefficients not estimable: Puninf.2 Puninf.3
>
> # Make a contrast matrix:
> A<-
> makeContrasts(PuninfvsMuninf=(Muninf.10+Muninf.11+Muninf.12-Puninf.2-Puninf.3)/3,
>           levels = design)
>
> # and now fit the contrast
> con.fit<- contrasts.fit(fit, contrasts = A)
> Error in contrasts.fit(fit, contrasts = A) : trying to take contrast of
> non-estimable coefficient
>
> # now change the column order in the design matrix and row order of
> contrast matrix
> # we create a new sort index - arbitrarily placing the last two columns in
> the original
> # design matrix in the middle of the new one.  Ditto for the rows of the
> contrast matrix.
> n<- nrow(A)
> newIndex<- c(1:5, n-1, n, 6:(n-2))
> designB<- design[,newIndex]
> B<- A[newIndex,]
> attributes(B)<- attributes(A)
>
> # create a new fit, but we get the same knd of error, the last two columns
> are
> # identified as non-estimable
> fitB<- lmFit(X$MA, design = designB)
> #Coefficients not estimable: P86.5 P86.6
> #Warning message:
> #Partial NA coefficients for 16278 probe(s)
>
>
> As far as we can tell, it's the trailing columns that get singled out, but
> we don't know why. Hopefully the information about is detailed enough, but
> we can provide more information if necessary to help troubleshooting.
>
> Thanks in advance,
>
> Joaquin
>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] limma_3.10.0
>
> loaded via a namespace (and not attached):
> [1] tools_2.14.0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list