[BioC] RMA-bimodality:

Tue Jun 6 15:17:29 CEST 2006

Hi Wolfgang (and everybody else)!

As pointed out by you there are two different issues here: a) the 
bi-modality of (GC)RMA normalized data on many chips (which I have 
observed repeatedly now as well ), b) the bi-modality of log(PM/MM) 
values as stated in the Irrizarry et al. paper.

In both cases the mathematical argument, that any continuous 
distribution can be monotonely transformed into any other continuous 
distribution holds (which is basically behind your statement that 
monotonous transformations do not preserve the number of peaks/modes), 
but I still think, that the observation a) of bi-modal distributions of 
gcrma normalized expression values is worth to be discussed.
Assuming GCRMA is good/perfect normalisation method the normalised 
values should directly relate to the "true" biological expressions and 
thus it is tempting to take such a histogram as an indication of there 
being two classes of genes: i) genes with no/small expression values 
(forming the first peak), ii) truely/highly expressed genes (forming the 
second peak).
If on the other hand the bi-modality is an implicit by-product of the 
GCRMA-normalisation, it doesn't make sense to interpret the bi-modality 
biologically in that way.

I have only  limited experiences with Affy arrays so far, but at least 
in one case the bi-modality also occured (but not so clearly) when using 
MAS5 instead of GCRMA, which I took as an indication that in this case, 
that GCRMA didn't create the two modes, but just made it easier to 
distinguish between them. I would be interested to hear the experiences 
of others in this respect.

Best Wishes

Claus

Wolfgang Huber wrote:
> Hi,
>
> I am surprised why anybody is surprised about the different number of
> modes ("peaks"): the number of modes of a distribution is not conserved
> under monotonous transformations (such as the background correction in
> RMA), this simply follows from chain rule.
>
> See below for a simple example with some "mock" microarray intensities z
> and density of log-transformed values before and after a (primitive)
> background background correction.
>
> Cheers
>  Wolfgang
>
>
> set.seed(123)
>
> n = 100000
> z = 20 + exp(c(rnorm(n), 3+rnorm(n)))
>
> par(mfrow=c(1,2))
> plot(density(log2(z)))
> plot(density(log2(z-20)))
>
>
> noel0925 at sbcglobal.net wrote:
>   
>> In the paper: Exploration, Normalization and Summaries
>> of High Density Oligonucleotide Array Probe Level Data
>> the following statement regarding the
>> bimodality of log2(PM) values and RMA background
>> corrected PM values can be found- "The same bimodal
>> effect is seen when we stratisfy by log2(PM), thus it
>> is not an artifact of conditioning on sums." (p4).
>> I am a little confused by this as I thought that
>> indeed an artifact of the convolution!
>>
>> Clearly, the background corrected intensity
>> values are given by E(S | O) or the conditional
>> expectation of the signal given what we observe; where
>> the observed signal is the convolution of a normally
>> distributed background (N) mean mu variance sigma^2
>> (B~ N(u, Ïƒ^2)) and an exponentially distributed
>> signal (S) with mean alpha (S~ exp(Î±)). 
>>
>> There have been several postings regarding this matter
>> in the Bioconductor archives and all seem to point to
>> this. Have I misunderstood?
>>
>> In particular was the following post:
>> https://stat.ethz.ch/pipermail/bioconductor/2004-August/005908.html
>> (See below the response from zwu at jhsph.edu 
>>
>> The original question I got was about the bimodal
>> distribution of gcrma
>> result from probe intensities with unimodel
>> distribution. My answer was
>> that the "change" was not necessarily surprising.
>>
>> For example , when you have "true log signal" from a
>> bimodal distribution
>> logS=c(rnorm(1000,3,1),rnorm(1000,8,2))
>> # You will see this has two peaks
>> par(mfrow=c(2,2))
>> plot(density(logS))
>> #if the background, log(non-specific binding) come
>> from 
>> logB=rnorm(2000,6,1)
>> #then when you plot the histogram of convolution in
>> log scale,
>> plot(density(log(exp(logS)+exp(logB)))) 
>> #you see only one peak, and this would be "before
>> gcrma".
>>
>> This explanation made sense to me, but seems to
>> contradict what is stated in the paper.
>>
>> Also, can someone explain the difference between RMA
>> background version1 vs version2?
>>
>>
>> Best regards,
>> Noel
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>     
>
>
>   

-- 
***********************************************************************************
 Dr Claus-D. Mayer                    | http://www.bioss.ac.uk
 Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
 Rowett Research Institute            | Telephone: +44 (0) 1224 716652
 Aberdeen AB21 9SB, Scotland, UK.     | Fax: +44 (0) 1224 715349