[BioC] Log transformation and left censoring

Wolfgang Huber whuber at embl.de
Mon Feb 4 09:56:43 CET 2013


Hi Paul

the model (in the "parametric" case) is

  var(X)  = a * E(X) + b * E(X)^2

with some a,b>0. And you are right that var(X)=0  <=>  E(X)=0
but since the support of X is positive, the only way that E(X)=0 is when the distribution of X is a point mass at 0. So that assumption has to be correct. For microarray data (e.g. in vsn), a model with a constant term on the right hand side of above equation (and Var(X)>0 even if E(X)=0) was needed, but it is hard to see how that could be the case for count data.

However, it could be that the variance-mean relationship of your data is not fit by any well-behaved smooth function, e.g. if you have other strong determinants of the variance than the mean (e.g. type of gene). In that case, there is no variance-stabilising transformation. One would need to explore your data for that.

	Best wishes
	Wolfgang 


Il giorno Feb 4, 2013, alle ore 7:33 AM, Paul Harrison <Paul.Harrison at monash.edu> ha scritto:

> On Sun, Feb 3, 2013 at 10:24 PM, Wolfgang Huber <whuber at embl.de> wrote:
>> 
>> Hi Paul
>> 
>> given your description, one possibility to explore might be a variance
>> stabilising transformation.
>> 
>> E.g. DESeq provides one that smoothly interpolates between the square-root
>> function for low counts and the log-transformation for higher counts, see
>> Section 6 (and 7) of the vignette.
>> 
> 
> Thanks Wolfgang. Zero is tranformed to the same value in each of the
> samples, which is good. The ability of limma to detect differential
> expression with this transformation appears to be about the same as
> with voom. But perhaps I should be using DESeq's nbinomTest.
> 
> I do wonder if this is still too mild near zero. When I plot variance
> for each gene vs mean for each gene, there is a distinct bump at low
> expression levels. Looking at vst.pdf the assumption seems to be that
> the variance of a zero count is zero, which isn't correct. Seeing a
> zero count, my expectation of the true mean would be somewhat greater
> than zero, and so the variance would also be somewhat greater than
> zero.
> 
> --
> Paul Harrison
> 
> Victorian Bioinformatics Consortium / Monash University



More information about the Bioconductor mailing list