[BioC] normalisation assumptions (violation of)

Tue Aug 8 19:20:58 CEST 2006

I forgot to add one thing:

On 8/8/06, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> On 8/8/06, J.delasHeras at ed.ac.uk <J.delasHeras at ed.ac.uk> wrote:
> > Quoting M Perez <perezperezmm at yahoo.es>:
> >
> > > Hi Jose,
> > >
> > > I think you should correct for background since as you
> > > have commented you have slides with high background
> > > intensity and you want to remove background biass. I
> > > dont know if you have already tried "normexp".
> >
> > Hi Manuel,
> >
> > I haven't really. I did a long time ago and what put me off was having
> > to search for the right offset, when I was hoping for something a bit
> > more "automatic" (and at the time I used LimmaGUI, which is a bit more
> > tedious if you want to experiment a little). I should try that.
> > However, I notice that the background usually appears to have little or
> > nothing to do with the signals measured. The background tends to be
> > very uniform across the slide, and the fact that I get "negative spots"
> > where you see less signal on the actual spot than around it, makes me
> > think that the cDNA spotted acts as a pretty good block against that
> > general background. In other words, I am not convinced that the
> > background measured on the glass has much to do with the signal I
> > measured on a spot of DNA, and substracting background may be actually
> > a bad thing to do.
>
> That is a very good statement.  We have to ask ourselves what kind of
> "background" there is, not just define background from what methods we
> have available!  For instance, it is possible to prove scientifically
> that the scanner introduce an offset. It might simply be that the
> image-analysis based background estimators happen to get close to the
> scanner background; that does not mean that the detected signal in the
> proximity of a spot is added to the spot, it just happens to be a good
> proxy to get to the scanner offset.  That is just a hypothesis and in
> general I think that image-background signals are poor and noisy
> estimators of the scanner offset.
>
> > Another reason I think background substraction doesn't matter much, is
> > that on the occasions when I do see some pattern on the background
> > (using 'imageplot' for instance, you can tune the ranges to display to
> > enhance and view those patterns), it often doesn't translate on a
> > pattern when you display the red/green ratios, or the signals on their
> > own. Not always, but quite often, from what I've seen. And when you do
> > get some scratches that affect clearly the signal measured, it might
> > make more sense to flag those spots... or to simply rely on the fact
> > that there should be enough replicates, so an odd measurement should
> > not affect the outcome too much (hopefully if on another slide I have
> > another scratch it will not affect the very same spots again :-)
>
> Agree.
>
> > I think I like Henrik Bengtsson's idea about measuring the background
> > inherent to a particular scanner, and substract that instead... but I
> > haven't yet explored that properly (hangs head in shame)... the probelm
> > with being a one-man operation is that you're pressed to get results
> > that are "good enough" to continue the biology, rather than spending
> > too much time in working out what's teh best way to get the most of the
> > data available. If only I could clone myself... but then I wouldn't
> > like to work with myself... ;-)
> >
> > Right now I am exploring another avenue: repeating those experiments
> > that gave me high background with view to remove the offending slides
> > and use something of better quality. In this case it's relatively
> > simple, but many tiimes I will not have the luxury, therefore I still
> > want to understand the problem with background better.
>
> Seriously, it is very easy to do scanner calibration.  Much easier
> that repeating experiments.  Also, if the scanner offset is stable
> over time, which I suspect it is, you might only have to do this once
> every now and then, and simply just reuse the same estimate across
> arrays.
>
> Scan the same array at say four different PMTs, e.g. 800V, 700V, 600V
> and 500V.  Keep the array in the scanner between scans to keep
> everything but the PMT as similar as possible.  That way you can reuse
> the spot mask identified by Axon GenePix Pro on the 800V for the other
> images too.  You'll get four GPR files.  Pull out the foreground
> signals for one channel at the time from each of them as a vector,
> e.g. X800, X700, X600, X500, and put them in a matrix
>
>  X <- matrix(c(X800, X700, X600, X500), ncol=4)

Already here you can see if you've got scanner offset or not.  Plot
you data pairwise and zoom in at (0,0) and see if the datapoints from
the different pairs converge at (0,0) or not;

par(pch=19)
plot(NA, xlim=c(0,700), ylim=c(0,700), col=(col <- 1))
abline(a=0,b=1)
for (ii in 1:3) for (jj in (ii+1):4) points(X[,c(ii,jj)], col=(col <- col + 1))

See attached image for example.

/Henrik

>
> Then estimate and calibrate the signals;
>
>  library(aroma.light)
>  Xc <- calibrateMultiscan(X)
>
> 'Xc' will be a singel vector or length nrow(X).  The attribute
> 'modelFit' will contain the parameter estimates for that channel, i.e.
> the scanner offset etc.  The scanner offset is in 'adiag', that is
>
>  scannerOffset <- attr(Xc, "modelFit")$adiag
>
> Do the same for the other channel(s).  Single-channel users are done here.
>
> FYI: The 'aroma.light' package provides a matrix-only interface to
> calibration/normalization methods.  If have higher-order interfaces in
> 'aroma' off-Bioconductor, but the above should be enough.  When there
> is time (?!?) I'll also provide wrappers to the 'exprSet' class.
>
> /Henrik
>
> >
> > > Anycase and talking about the normalization process I
> > > think you dont should be so worry about the violation
> > > of the number of genes DE in your normalization
> > > process. I have been working with similar experiment
> > > that you mentioned using print-tip-loess and the
> > > results were prety good.
> >
> > I'm glad to hear that. I had similar comments from other sources, and I
> > must admit that the (very) few controls I had in my experiment seem to
> > behave properly if apply print-tip-loess (and no bkg correction,
> > because when I do I run into problems, as I mentioned in another thread)
> >
> >
> > > It is true that the normalization process is basesd in
> > > some assumptions. But not single microarray experimen
> > > fullfil these assumptions...
> > > HTH
> > > Manuel
> >
> > I am aware that loess is pretty robust... I just wasn't sure that it
> > was robust enough in an experiment such as this, where I expect the
> > average median of ratios to be above 1 (although not by much,
> > admittedly).
> >
> > Thanks for all the comments. I will definitely explore the normexp bkg
> > correction method.
> >
> > Jose
> >
> > --
> > Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
> > The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
> > Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
> > Swann Building, Mayfield Road
> > University of Edinburgh
> > Edinburgh EH9 3JR
> > UK
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scannerOffset.png
Type: image/png
Size: 22436 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20060808/efae6d9c/attachment.png