[BioC] Missing Values after cyclic loess in limma

Sun Nov 4 10:39:37 CET 2012

Hi Gordon

thanks for the many good points.

Literature: I don't have an overview over uses or citations of vsn, and benchmarking normalisation methods is as we know a complex topic, but below I paste some recent references where vsn was used for multiple (dozens - hundred) single colour arrays.

Affine linear: vsn transforms each array's data x with the transformation glog(a*x+b) with array specific parameters a and b, and an overall (same for all arrays) function glog(y)=log2( (y+sqrt(y^2+1) /2). The array specific part is affine linear.

Cyclic loess is an iterative algorithm (as is vsn), and its implementation in limma by default stops after 3 iterations regardless of whether convergence was reached. While I concur that this is numerically robust, isn't the lack of data-dependent convergence diagnostic a reason to worry?

I also have a question to you: there is nothing intrinsically bivariate (1D regressor, 1D response) about local regression, multivariate approaches have been proposed (e.g. Keppler, Crosby, Morgan in Genome Biology 2002), and good implementations exist in R (e.g. locfit package), why is it so popular to do this pair-wise (with the obvious drawback of n^2 complexity)?

Best wishes
Wolfgang  

Zhenyu Xu*, Wu Wei*, Julien Gagneur, Fabiana Perocchi, Sandra Clauder-Muenster, Jurgi Camblong, Elisa Guffanti, Francoise Stutz, Wolfgang Huber, and Lars M. Steinmetz. Bidirectional promoters generate pervasive transcription in yeast. Nature, 457(7232):1033-1037, 2009.

Zhenyu Xu*, Wu Wei*, Julien Gagneur*, Sandra Clauder-Münster, Miosz Smolik, Wolfgang Huber, and Lars M. Steinmetz. Antisense expression increases gene expression variability and locus interdependency. Molecular Systems Biology, 7, 2011. 

E. Benito, L. M. Valor, M. Jimenez-Minchan, W. Huber, and A. Barco. cAMP response element-binding protein is a primary hub of activity-driven neuronal gene expression. Journal of Neuroscience, 31:18237-18250, 2011.

Ramona Schmid, Patrick Baum, Carina Ittrich, Katrin Fundel-Clemens, Wolfgang Huber, Benedikt Brors, Roland Eils, Andreas Weith, Detlev Mennerich, and Karsten Quast. Comparison of normalization methods for Illumina BeadChip(R) HumanHT-12 v3. BMC Genomics, 11:349, 2010. 

Il giorno Nov 4, 2012, alle ore 1:05 AM, Gordon K Smyth <smyth at wehi.EDU.AU> ha scritto:

> The problem had nothing to do with the loess function.
> 
> I do not know of any objective grounds by which one could claim vsn to be more numerically robust than loess.  The former requires iterative parameter estimation whereas loess is a closed-form calculation requiring nothing more complex than linear regression.
> 
> The literature does indeed suggest that cyclic loess would an obvious choice in high DE situations, which is the context here.  There is no literature than I know of supporting vsn in this context.
> 
> Affine functions are linear transformations with an intercept.  Vsn is not a linear transformation while, ironically, the local polynomials used by loess are.
> 
> Gordon
> 
>> Date: Fri, 2 Nov 2012 19:45:45 +0100
>> From: Wolfgang Huber <whuber at embl.de>
>> To: "Claus Mayer [guest]" <guest at bioconductor.org>
>> Cc: bioconductor at r-project.org
>> Subject: Re: [BioC] Missing Values after cyclic loess in limma
>> 
>> Hi Claus
>> 
>> if there is a chance that affine functions might already do a good enough job for you, compared to loess' local polynomials, then "vsn" might be an option for you, which is intended to be more numerically robust.
>> 
>> 	Best wishes
>> 	Wolfgang
>> 
>> Il giorno Nov 2, 2012, alle ore 6:45 PM, "Claus Mayer [guest]" <guest at bioconductor.org> ha scritto:
>> 
>>> 
>>> Hello,
>>> 
>>> I am just working on my first ever single channel Agilent array data set. Because I do expect large changes in differential expression I wanted to use the cyclic loess normalisation within limma rather than quantile normalisation. I used the default settings i.e.
>>> 
>>> y<-normalizeBetweenArrays(x,method="cyclicloess")
>>> 
>>> where x is the ELlistRaw object. As expected this took a while but to my surprise produced hundreds of missing values for each array as indicated by the message
>>> 
>>> Warning message:
>>> In log2(Recall(object$E, method = method, ...)) : NaNs produced
>>> 
>>> I checked the raw values which are all well above 0 and include no NAs. I also did not use any background correction, so I don't quite understand why logging should produce any missing values. I had assumed that the method would first log and then apply the cyclic loess algorithm, which in itself shouldn't produce any NAs either. Have I misunderstood something basic here?
>>> 
>>> Thanks,
>>> 
>>> Claus
>>> 
>>> 
>>> 
>>> -- output of sessionInfo():
>>> 
>>> R version 2.13.0 (2011-04-13)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>> 
>>> other attached packages:
>>> [1] limma_3.8.2
>>> 
>>> --
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}