[BioC] aroma.light advice sought

Henrik Bengtsson hb at stat.berkeley.edu
Wed Jul 16 20:28:46 CEST 2008

Hi Iain,

On Wed, Jul 16, 2008 at 1:54 AM, Iain Gallagher
<iaingallagher at btopenworld.com> wrote:
> Dear List
> I was wondering if someone could give me some advice on the workflow for the R/Bioconductor package aroma.light. I have 12 Exiqon arrays, stained with Cy3 and scanned at 3 different PMT settings of 250, 300 and 350 (75% laser power) using a Genepix 4200A Autoloader. These settings are based on the an initial scan with PMT set to auto which scanned each array at ~350.

So, if I understand it correctly they're scanned at in either the
order (350, 300, 250) or (350, 250, 300).  Just for clarification, the
order does not matter, but many people are more comfortable with
having the first scan set to their defaults just in case there is,
say, dye bleaching.  We didn't find dye bleaching to be a problem.

The only thing to be careful about is to not set the PMT too low or
too high.  If too low, the scanner noise will take over and if too
high, scanner saturation/censoring takes place.  Otherwise, the noise
you obtain does indeed scale with the signal, i.e. the exact PMT
setting is not critical as long as you're using "decent" settings.

Also, make sure to read help("1. Calibration and Normalization"),
especially the suggestion that you should keep as much as possible
fixed between scans but the PMT, e.g. avoid washing arrays etc.

> I have read the papers by H. Bengtsson et al from 2004 and 2006 describing the scanner offset problem and the solution (as implemented in aroma). I am, however rather naive with microarray data handling and I am unsure however how to proceed in terms of analysis.
> Specifically, do I need to carry out the calibrateMultiscan.matrix procedure for each triplicate of arrays or can I just proceed to affine normalization? Once I have the normalized data to I back transform this using the backtransformAffine.matrix procedure or do I use the data straight after the nomalization?

Consider multiscan calibration to be a step totally independent of
following normalization.  Always do multiscan calibration *before
anything else*.  I prefer to use affine normalization to normalize
between channels, but others prefer curve-fit normalization ("loess")
or quantile normalization etc.  What you choose is independent of the
multiscan calibration.  In either case, you never have to call
backtransformAffine() yourself (that's a low-level method).

Multiscan calibration is a calibration method that is applied to each
hybridization and each channel separately.  Say you have K=3 scans,
each with N signals in both channels.  Take your signals across all
arrays in the first channel put these signals in a NxK matrix 'XR'.
Do the same for the other(s) channel(s).  Then do:

XRc <- calibrateMultiscan(XR);
XGc <- calibrateMultiscan(XG);

Now you have two Nx1 matrices with calibrated signals for the red and
the green channels for that hybridization.  That's all you need to do
"merge" multiple scans for one hybridization.  No parameters to choose
- nothing.  If you want to see the parameter estimates, do:

 fit <- attr(XRc, "modelFit");

The scanner offset is e=fit$adiag[1] and the relative scale (to the
first channel) of each channel is bb=fit$b. These are denoted e_c and
bb_c = (1, b_2, ..., b_K) in Bengtsson et al. 2004.  It is the scanner
offset that cause problem, not the relative scales.  We see offsets in
the range of 15 to 25 units (out of 65,535).  It would be interesting
to hear back from you what *scanner offsets* you observe with your
scanner and how stable this is across arrays.

Foreground and/or background signals? (this question is typically
asked sooner or later)  First of all, if you as I prefer to work with
foreground signals only, then the answer is simple - use only
foreground signals in XR and XG above.  The underlying model multiscan
calibration method is based on effects that goes on in the scanner and
not on the array.  That is, it assumes that every pixel intensity
undergoes the same transform regardless whether it is a pixel, say,
inside or outside a spot.  Thus, if you add background signals to your
'XR' and 'XG' above, they should increase the precision of your
estimate.  However, there are typically enough foreground signals to
achieve high-precision estimates anyway, so it doesn't really make
difference in the end of the day.  However, there is a risk fitting
with background signals, and that is that the background estimates
(the are many different methods out there) might be biased relative to
the foreground estimates.  We didn't study this, so I don't know if it
is a real problem.  To summaries, don't worry and use foreground only
if that is use downstream and use foreground & background if that is
used downstream.

Finally, if you want to see how strong of an effect your scanner
offset it, you can look at the within-channel log-ratios between the
choose(K,2) (=3) scan pairs like this:


If there is a scanner offset, you find the MvsA data points to curve
at the lower intensities (cf. Figure 7a in the paper), otherwise not
(Figure 7b).  Actually, you should see them converge to M=0 at
A=log2(e) if you have a scanner offset.  To see if the calibration
controls for this, try (Figure 7c):

XRc2 <- calibrateMultiscan(XR, average=NULL);

Now all pairs should overlap almost perfectly.

Let me know if you have any other questions.


> After this step I am fairly confident and can use SAM and limma to investigate differential expression. I would, however like to use the information gathered from across the scans to maximise my data collection opportunity
> Thanks for any advice.
> Iain
>        [[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list