[BioC] Huber et al: Variance stabilization ...

Wolfgang Huber w.huber@dkfz-heidelberg.de
Wed, 12 Feb 2003 16:41:49 +0100


Hi,

one possible answer to your question is that MAS 5.0 is evil and that it's
better to use almost any preprocessing strategy, like RMA, dChip, MAS 4.0.

The variance-stabilizing transformation that we have proposed, as you
rightly remarked, only works with data that has a strictly positive, roughly
quadratic variance-mean dependence. That seems to exclude data produced with
MAS 5.0.

There seem also to be other unpleasant effects associated with MAS 5.0. For
example, their use of abruptly different rules for the processing of the
probe intensities, dependent on the continuous values PM and MM, seems to
really mess up the distribution of the data across different chips. (I.e,
even if the intensities of a probe across different chips were normally
distributed, their resulting expression values could have a rather ugly
distribution.)

Best regards -

Wolfgang

Division of Molecular Genome Analysis
German Cancer Research Center (DKFZ)
Im Neuenheimer Feld 580
69120 Heidelberg, Germany

w.huber@dkfz.de
http://www.dkfz.de/mga/whuber
Tel +49-6221-424709
Fax +49-6221-42524709


> -----Original Message-----
> From: bioconductor-admin@stat.math.ethz.ch
> [mailto:bioconductor-admin@stat.math.ethz.ch]On Behalf Of Hüsing,
> Johannes
> Sent: Wednesday, February 12, 2003 4:07 PM
> To: bioconductor@stat.math.ethz.ch
> Subject: [BioC] Huber et al: Variance stabilization ...
>
>
> Hi all (and especially the 1st author of the paper Huber et al. Variance
> stabilization applied to microarray data calibration and to the
> quantification of differential expression. Bioinformatics 2002),
>
> while shoehorning the data into a symmetric distribution is a popular
> motivation for transforming the data, I agree that variance
> stabilization is tantamount when contrasts are to be meaningfully
> interpreted.
>
> The practical difficulty I experience is as follows. I am investigating
> Affymetrix chips processed by MAS 5 (the original program, not the
> Bioconductor algorithm which gives roughly proportional results). As all
> expression values here are positive, for very small values the variance
> is bounded by the mean (largest when all but one chip have zero
> expression for the current gene). When I fit a low-parameter function to
> the variance-mean dependency, I obtain a negative intercept or other
> parameter constellations that prohibit an arsinh transformation.
>
> In this situation it seems rather like a curse than a blessing to me
> that negative values are verboten with the MAS 5.0 algorithm. Allowing
> negative values would take away the boundedness.
>
> Right now I am toying with loess fits to subsamples of the genes, and
> the general appearance is similar to a parabolic curve but with a
> broader base. I must admit that I have based my observations only on two
> chips, which makes a poor estimate for the variance, but with a lot of
> genes. The general problem of boundedness with positive values only
> remains though, and I'd expect negative intercepts to occur more often
> with more chips, as (mean, var) pairs of close to 0 would appear less
> often and therefore exert less influence on the regression curve.
>
> Has anyone made similar experience, and what are your suggestions?
>
> Greetings
>
>
> Johannes
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>