[BioC] help in 2-color data normalization

Fri May 11 18:22:03 CEST 2007

Hi Jianping,

> In terms of my previous question of whether or not they could be "real"
> difference existing between the colon cancer and the universal cancer
> cell line RNAs, considerations may be given beyond just removing those
> spots. What I noticed was that some probes can only be hybridized with
> the reference RNAs and some others only with colon cancer samples (see
> "RG_cutoff.jpeg" at <http://www.unc.edu/~jjin/Graph/> ). Take one chip
> as an example, 4548 genes showed  green signals more than 2^8 with read
> signals less than 2^6, and 1831 genes showed read signal more than 2^8
> with green signal less than 2^5. On both cases maximum signals, read or
> green, can be as high as 2^12. The observation suggested that there
> exist some real differences between RNAs.

I am not surprised that you can find individual genes that have signal  
only in one of the samples, either the reference or the cancer one. In  
fact, this is teh sort of thing I am usually looking for: genes that  
are either silenced or activated in cancer, with respect to a "normal"  
reference.

The plot your showing does not appear to come from normalised arrays,  
in which case you can infer little from the differences in the  
distribution. What it does show is that you have very weak signal on  
both channels on both arrays...

Normalise your data (within arrays, probably using some "flavour" of  
loess), and look at the MA plots: that's a better picture of what's  
going on.
In an ideal plot, genes that are only expressed in one sample tend to  
cluster along the left 2 sides of an imaginary diamond... for instance:

http://mcnach.com/MISC/MAplot.png

This is a very unusual MA plot, from an experiment where many many  
many genes are activated (a cell line transfected with a strong  
activator, hybridised against the non-transfected cells).
I drew in red the "imaginary diamond", and numbered 1 and 2 teh two  
sides I was talking about. Along 1 you get genes that are activated in  
one sample (with M>0), and along 2 you woudl get genes silenced in teh  
same sample (with M<0).
This experiment is unusual in that it allows to see clearly a "spike"  
of activated genes along "1". In most experiments you don'd see  
anything like that, but that's the area where ideally you'll have this  
sort of genes clustering. If there are many genes that only have  
signal in either of your samples, you may see a well populated "cloud"  
around these areas.
Your MA plots seem to me to indicate that this is the case (starting  
from A around 8+, the stuff on teh left seems a little artifactual)...  
but you really need to dig in deeper if you want some clear answers ;)

> This raises another question. Is the pooled universal cancer RNA an
> idea reference? It may create difficulties in explanation of results
> for some genes.

Ideal? It depends on teh experiment, I suppose.
It all depends on what questions you're asking. Even very closely  
related samples, from similar tissues, one cancerous and one normal,  
have lots of expression differences. Your answers will of course be  
determined by what comparisons you're making, what references you  
choose, etc. A pooled "universal cancer" RNA can potentially contain  
very different types of cells, etc... which can be good or bad,  
depending on what you're after, really...

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK