[BioC] Array normalisation with Limma: would this be reasonable?
J.delasHeras at ed.ac.uk
J.delasHeras at ed.ac.uk
Mon Dec 4 14:30:21 CET 2006
I am having trouble trying to normalise my data properly.
Briefly, I have a number of 2-colour cDNA arrays. Every slide is
hybridised to 1) a reference sample (non-transfected RNA from a cell
line), and 2) a transfected sample (on teh same cell line).
So the question is transfection vs. non-transfection. So far so good.
What's the problem?
The transfection is of a plasmid that will "activate" expression of
many genes (it's a fusion protein between a DNA-binding domain that
would target many gene promoters, especially silenced ones, and a
potent transactivator domain). This means that a large proportion of
genes are differentially expressed, with most going from no or very
little expression, to a clearly detectable level.
This means that loess normalisation doesn't work very well. Actually,
it works "too well". On the raw data, if you plot Cy3 vs Cy5 (logged),
there's the usual diagonal with the bulk of the data, and then a
(usually) large spike with low Cy3 and varying Cy5 (parallel to the
Cy5 axis), or viceversa, depending on how the transfection was labelled.
BUt then, after print-tip group loess, what I see is that the spike
gets severely distorted, pulled towards the bulk of the data in the
diagonal, and this results in a clear underestimation of the number of
real DE genes.
I'm exploring alternatives, and I had an idea. It seems a bit "rough",
so I wonder what more experienced people think.
This is teh idea: I can identify most of the spots on the "spike" by
virtue of their having just about background signal on one channel,
and decent signal on teh other. This I can do on the raw data, either
by looking and the foreground and background intensities on each
slide, or at the signal to noise ratio (SNR) that Genepix produces.
Once these are located, I can assign zero weight to them, which means
that the normalisation (loess) is applied using only the bulk of the
spots, that mostly don't change that much.
My hope is that this would remove the distortion of the spike due to
loess, but would still be adequate enough to "balance" the Cy3 and Cy5
I have experimented trying different values for teh 'span' parameter
in loess, from the default 0.3 up to 1.0. The higher the span, the
smaller the distortion, although the angle of the spike varies and
it's still not quite right.
In the light of what the raw data scatterplots look like (attachment),
does anyone have objections to my "solution"?
I realise that the best would be to have a set of control spots for
these arrays, but unfortunately I don't have that luxury. I have
identified a small set of genes that do not change expression,
consistently across experiments, even when done in another cell line.
But these are only 7 genes, which cover the effective range of A
values, and I don't think that 7 genes is enough (when I tried limma's
normalisation method 'control' it gave me an error that appear to be
due to too few spots used as controls).
I'd be grateful for any comments.
Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
More information about the Bioconductor