[BioC] missing value handling in limma

Tue Jun 8 00:46:00 CEST 2004

At 04:53 AM 8/06/2004, xiaocui zhu wrote:
>Hi all,
>
>I recently used the linear model fit in limma to rank differentially
>expressed genes between treated vs. control with a data set. The data
>includes three log2(Treated/Control) replicate sets, and a dyeSwap for
>each replicate.   So the design matrix is c(1,-1,1,-1,1-1).  Among the
>top rank genes, I noticed some of them have only one log2Ratio
>measurement with the rest being "NA". I set the log2Ratio of a gene to
>"NA", if its green or red intensity measurement is below background,
>saturated, low intensity, or non-uniform.  I am wondering how the linear
>model in limma handles missing values and why a gene with only one data
>point is identified as a high ranking differentially expressed gene.

It is perfectly possible although very unlikely to a gene with only one 
non-missing value to be top-ranked. It would have to have an 
extraordinarily large fold change for this to happen.

limma handles missing values in the usual way for linear models at the 
lmFit() step. A gene with only one value will get df.residual=0. At the 
shrinkage step, the residual standard deviation for such a gene will be 
reset to the consensus value across all genes, and the corresponding 
degrees of freedom will be df.prior. This is explained in the article 
Smyth, SAGMB, 2004, cited in the documentation.

Gordon

>Thank you for your help in advance!
>
>Xiaocui