[BioC] Log transformation and left censoring

Thu Feb 7 06:43:42 CET 2013

On Mon, Feb 4, 2013 at 7:56 PM, Wolfgang Huber <whuber at embl.de> wrote:
>
> Hi Paul
>
> the model (in the "parametric" case) is
>
>   var(X)  = a * E(X) + b * E(X)^2
>
> with some a,b>0. And you are right that var(X)=0  <=>  E(X)=0
> but since the support of X is positive, the only way that E(X)=0 is when the distribution of X is a point mass at 0. So that assumption has to be correct. For microarray data (e.g. in vsn), a model with a constant term on the right hand side of above equation (and Var(X)>0 even if E(X)=0) was needed, but it is hard to see how that could be the case for count data.
>
> However, it could be that the variance-mean relationship of your data is not fit by any well-behaved smooth function, e.g. if you have other strong determinants of the variance than the mean (e.g. type of gene). In that case, there is no variance-stabilising transformation. One would need to explore your data for that.
>

Ok, if I wish to proceed with my folly it looks like the glog function
log(x+sqrt(x*x+1)) used by vsn would be a good choice, since it is
based on var(X) = a + b * E(X)^2. This does actually look marginally
better. The method of getting a variance stabilizing transform from
var(x) involves an approximation, maybe that's breaking down slightly,
or maybe my data is weird.

Thanks, I feel I have a few different perspectives and a better idea
of the history of these ideas now.

Paul Harrison

Victorian Bioinformatics Consortium / Monash University