[BioC] limma with missing values

Hans-Ulrich Klein h.klein at uni-muenster.de
Thu Mar 6 13:43:34 CET 2008

Dear all,

I am struggling with using limma in the case of many missing values. 
Here is a small example:

df = data.frame(

  conts = list(treat="contr.sum", class="contr.treatment")
  X = t(df[,1:3])
  D = model.matrix( ~ class + treat, data=df, contrasts=conts)
  Fit = lmFit(X, D)

These are the results:

 >   Fit$coefficients
   (Intercept)       classB treatT2 treatT3
g1           5 2.220446e-16       1      NA
g2           5 2.220446e-16      NA       1
g3           6 3.330669e-16      -1      NA

Suppose I am interessted in the comparison of T2-T1. The column 
"treatT2" of "Fit$coefficients" looks fine for gene 1 and gene 2. In the 
case of gene 3 the baseline level T1 can not be estimated and the 
coefficient treatT2 seems to be the comparison of T2-T3. I find this 
confusing when computing many thousand models and applying toptable(Fit, 
coef="treatT2") afterwards.

I changed the design matrix by setting
conts = list(treat="contr.sum", class="contr.sum")
Now, I get the results:

 >   Fit$coefficients
   (Intercept)        class1 treat1 treat2
g1         6.0 -1.665335e-16   -1.0     NA
g2         5.5 -1.665335e-16   -0.5     NA
g3         5.0 -1.110223e-16   -1.0     NA

However, I expected:
    Intercept    treat1   treat2
g1   5.5              -0.5     0.5
g2   5.5              -0.5     NA
g3   5.5              NA      -0.5

Is it appropriate to use limma in such scenarios? Did I something wrong 
or misinterpreted the results? I think, replacing the NAs by mean 
intensities would be a solution in this case.


 > sessionInfo()
R version 2.6.2 (2008-02-08)


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] limma_2.12.0

loaded via a namespace (and not attached):
[1] rcompgen_0.1-17

More information about the Bioconductor mailing list