[BioC] P values on Log or Non-Log Values

Tue May 6 11:30:53 MEST 2003

Hi,

> At 01:36 AM 6/05/2003, James MacDonald wrote:
> > >From a theoretical standpoint it is more correct to do t-tests on logged
> > data because one of the assumptions of the t-test is that the underlying
> > data are normally distributed. Microarray expression values are almost
> > always strongly right-skewed, and logging causes the distribution to
> > become much more symmetrical.
> ...
> But the main point here is, as Jim says, it has to be a whole lot better on
> the log-scale because the log-intensities are more symmetrically distributed.

Blythe Durbin has done some studies on the effect of transformations on
the distribution of microarray data [1], comparing raw scale, log scale,
and a "generalized log", i.e. a function of the form

  f(x) = log(x+sqrt(x^2+c^2)) - log(2)

that behaves like the log for x>>c and like a linear function for x~0.
While the log is good for high intensities, for small x the log might lead
to strongly fluctuating values and even create skewness, so the
generalized log is in many cases a good interpolation.

Another nice property of the latter is that for a suitable choice of c it
can stabilize the variance, i.e. make the standard deviation of the data
approximately independent of their mean.

[1] http://handel.cipic.ucdavis.edu/~dmrocke/biolikelihood.pdf
Chapter 3.

Best regards
  Wolfgang