[BioC] Huber et al: Variance stabilization ...

Wed, 12 Feb 2003 16:06:44 +0100

Hi all (and especially the 1st author of the paper Huber et al. Variance
stabilization applied to microarray data calibration and to the
quantification of differential expression. Bioinformatics 2002),

while shoehorning the data into a symmetric distribution is a popular
motivation for transforming the data, I agree that variance
stabilization is tantamount when contrasts are to be meaningfully
interpreted.

The practical difficulty I experience is as follows. I am investigating
Affymetrix chips processed by MAS 5 (the original program, not the
Bioconductor algorithm which gives roughly proportional results). As all
expression values here are positive, for very small values the variance
is bounded by the mean (largest when all but one chip have zero
expression for the current gene). When I fit a low-parameter function to
the variance-mean dependency, I obtain a negative intercept or other
parameter constellations that prohibit an arsinh transformation. 

In this situation it seems rather like a curse than a blessing to me
that negative values are verboten with the MAS 5.0 algorithm. Allowing
negative values would take away the boundedness.

Right now I am toying with loess fits to subsamples of the genes, and
the general appearance is similar to a parabolic curve but with a
broader base. I must admit that I have based my observations only on two
chips, which makes a poor estimate for the variance, but with a lot of
genes. The general problem of boundedness with positive values only
remains though, and I'd expect negative intercepts to occur more often
with more chips, as (mean, var) pairs of close to 0 would appear less
often and therefore exert less influence on the regression curve.

Has anyone made similar experience, and what are your suggestions?

Greetings

Johannes