[BioC] on the meaning and value of $stdev.unscaled in a limma's lmFitted MArrayLM object

Fri Dec 4 11:59:59 CET 2009

Greetings all,

I am writing to enquire about the meaning of some elements that are
provided in an limma::MArray-LM object resulting from a call to
limma::lmFit();

To avoid a verbose output, I tried to minimize the number of genes to
be fitted, by selecting just two genes from my expression set:

>eset.2167 <- eset.more[c(848,849)]
>fit.2167 <- lmFit(eset.2167, designo)  # where 'designo' is my design matrix

I did try fitting a selection from my expression set containing just
one gene, but I understand that lmFit() does not like this, as it
yields:

"Error in  fit$effects[(fit$rank + 1):narrays, , drop = FALSE] :
incorrect number of dimensions"

My concern, specifically, is that I am not sure how the standard
errors on the lmFit's coefficients are calculated, as they are
returned with a value that differs from what I get from a separate run
in MSExcel, using the built-in function linest(), with the same design
matrix and vector of y kept the same. Convincingly, the output of best
values of the fitted coefficients is identical in lmFit and MSExcel's
linest().

Take for example $stdev.unscaled which is defined (from ?MArrayLM) as

"$stdev.unscaled
matrix containing unscaled standard deviations of the coefficients or contrasts"

> fit.2167$stdev.unscaled
           (Intercept)   Dose1Gy Ageing6mo Dose1Gy:Ageing6mo
Ageing6mo:LabLNGS Dose1Gy:Ageing6mo:LabLNGS
A_23_P8812         0.5 0.7071068 0.7071068                 1
0.7071068                         1
A_24_P7642         0.5 0.7071068 0.7071068                 1
0.7071068                         1

Now to MSExcel, focusing on the first of the two genes (Gene ID 2167
mapped by probe A_23_P8812).  Here are the standard errors on the
fitted coefficients, as results from a call to linest()

Dose1Gy:Ageing6mo:LabLNGS	Ageing6mo:LabLNGS	Dose1Gy:Ageing6mo	Ageing6mo	Dose1Gy	(Intercept)
best estimates 	0,50034025	-1,75687825	-0,3867455	3,055038	0,21681525	8,00081275
std errors	
0,262705064	0,185760532	0,262705064	0,185760532	0,185760532	0,131352532

As one can see, the stderr on the (intercept), for example, is
0,131352532, whereas limma:lmFit() gave my a 0.5.

What is the reason for such apparent discrepancy?
Is there any a-priori mistake made here, since Excel fitted just one
gene while lmFit() focused on two genes in this specific example?

Thanking you in advance, here is my sessionInfo()
Yours,
Massimo

> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-apple-darwin9.8.0

locale:
[1] C

attached base packages:
 [1] grid      tcltk     tools     stats     graphics  grDevices utils
    datasets  methods   base

other attached packages:
 [1] GOstats_2.12.0          graph_1.22.2            Category_2.12.0
      affy_1.24.2             gplots_2.7.3            caTools_1.10
 [7] bitops_1.0-4.1          gdata_2.6.1             gtools_2.6.1
      hgug4112a.db_2.3.5      org.Hs.eg.db_2.3.6      RSQLite_0.7-3
[13] DBI_0.2-4               Agi4x44PreProcess_1.6.0 genefilter_1.28.0
      annotate_1.24.0         AnnotationDbi_1.7.20    limma_3.2.1
[19] Biobase_2.6.0           svGUI_0.9-46            svSocket_0.9-48
      svMisc_0.9-56

loaded via a namespace (and not attached):
[1] GO.db_2.3.5          GSEABase_1.8.0       RBGL_1.20.0
XML_2.6-0            affyio_1.13.5        preprocessCore_1.7.9
splines_2.10.0
[8] survival_2.35-7      xtable_1.5-6
>
Massimo Pinto
Post Doctoral Research Fellow
Enrico Fermi Centre and Italian Public Health Research Institute (ISS), Rome
http://claimid.com/massimopinto