[BioC] Array normalisation with Limma: would this be reasonable?
J.delasHeras at ed.ac.uk
J.delasHeras at ed.ac.uk
Wed Dec 6 15:03:33 CET 2006
Not sure if this would be of interest to anyone... but the approach I
described below seemed to work pretty well. I used weights to exclude
the "spike" from print-tip loess normalisation, and didn't use the
weights to fit the linear model. The scatter plots before/after
normalisation look very reasonable, and the results I am getting make
Quoting J.delasHeras at ed.ac.uk:
> I am having trouble trying to normalise my data properly.
> Briefly, I have a number of 2-colour cDNA arrays. Every slide is
> hybridised to 1) a reference sample (non-transfected RNA from a cell
> line), and 2) a transfected sample (on teh same cell line).
> So the question is transfection vs. non-transfection. So far so good.
> What's the problem?
> The transfection is of a plasmid that will "activate" expression of
> many genes (it's a fusion protein between a DNA-binding domain that
> would target many gene promoters, especially silenced ones, and a
> potent transactivator domain). This means that a large proportion of
> genes are differentially expressed, with most going from no or very
> little expression, to a clearly detectable level.
> This means that loess normalisation doesn't work very well. Actually,
> it works "too well". On the raw data, if you plot Cy3 vs Cy5 (logged),
> there's the usual diagonal with the bulk of the data, and then a
> (usually) large spike with low Cy3 and varying Cy5 (parallel to the
> Cy5 axis), or viceversa, depending on how the transfection was labelled.
> (See http://mcnach.com/MISC/RG_scatterplots.png).
> BUt then, after print-tip group loess, what I see is that the spike
> gets severely distorted, pulled towards the bulk of the data in the
> diagonal, and this results in a clear underestimation of the number of
> real DE genes.
> I'm exploring alternatives, and I had an idea. It seems a bit "rough",
> so I wonder what more experienced people think.
> This is teh idea: I can identify most of the spots on the "spike" by
> virtue of their having just about background signal on one channel,
> and decent signal on teh other. This I can do on the raw data, either
> by looking and the foreground and background intensities on each
> slide, or at the signal to noise ratio (SNR) that Genepix produces.
> Once these are located, I can assign zero weight to them, which means
> that the normalisation (loess) is applied using only the bulk of the
> spots, that mostly don't change that much.
> My hope is that this would remove the distortion of the spike due to
> loess, but would still be adequate enough to "balance" the Cy3 and Cy5
> channels appropriately.
> I have experimented trying different values for teh 'span' parameter
> in loess, from the default 0.3 up to 1.0. The higher the span, the
> smaller the distortion, although the angle of the spike varies and
> it's still not quite right.
> In the light of what the raw data scatterplots look like (attachment),
> does anyone have objections to my "solution"?
> I realise that the best would be to have a set of control spots for
> these arrays, but unfortunately I don't have that luxury. I have
> identified a small set of genes that do not change expression,
> consistently across experiments, even when done in another cell line.
> But these are only 7 genes, which cover the effective range of A
> values, and I don't think that 7 genes is enough (when I tried limma's
> normalisation method 'control' it gave me an error that appear to be
> due to too few spots used as controls).
> I'd be grateful for any comments.
> Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
> Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
More information about the Bioconductor