[BioC] Combining data from scans at different intensities

Henrik Bengtsson hb at stat.berkeley.edu
Thu Feb 15 06:52:31 CET 2007


Hi.

On 2/13/07, John Fowler <fowlerj at science.oregonstate.edu> wrote:
>
> Henrik Bengtsson <hb at ...> writes:
>
> >
> > Hi.
> >
> > On 2/13/07, John Fowler <fowlerj at ...> wrote:
> > > Hello,
> > >
> > > I would like to use data extracted from images scanned at 3 different
> > > intensities in our GenePix scanner.  There are a couple of papers
> > > that I could find (Lyng et al 04, Piepho et al 06) that describe
> > > methods to combine these data and thus help deal with problems of
> > > saturation and signals across the dynamic range of the scanner.
> > >
> > > I looked for a way to do this in bioconductor, and found a post from
> > > Dr. Henrik Bengtsson, indicating that this was possible using the
> > > aroma.light package in bioconductor.  However, he indicated that this
> > > should be done with data from scans in which the laser intensity =was
> > > not changed=.
> > >
> > > Unfortunately, my scans used two different laser intensities.
> >
> > So, what was your settings for the three scans?  If two scans have the
> > same laser setting, how does the third scan differ?  Different PMT
> > settings?
> >
> > >
> > > Does this invalidate using aroma.light for this purpose?  Is there
> > > any other Bioconductor package that could deal with my (apparently
> > > incorrectly obtained) data?
> >
> > What we observed from scanning at different sensitivity (=PMT) levels
> > was that the scanner adds an offset to the signals and that this
> > offset is independent of the PMT setting.  We also observed that this
> > offset is more or less constant across arrays (also roughly between
> > channels), indicating that the offset is added either in the PMT
> > (photomultiplier type) or more likely in the analogue-to-digital
> > electronics just after the PMT.  We observed this in both of the
> > scanners investigated, Axon GenePix 4000A and Agilent G2505A.
> >
> > The multiscan calibration model is applied to each channel separately.
> > Let c={R,G} be the two channels, and let e_c be the offset in channel
> > c.  Say you do multiple scans k=1,...,K.  Then y_{c,i}^(k) denotes the
> > probe signal in channel c for probe i and scan k.  Let the unknown
> > amount of hybridized sequence in this probe is denoted by x_{c,i},
> > which is independent of scan k. To be really precise here, x_{c,i} is
> > the amount of light emitted from probe i entering the PMT.  We
> > proposed the model:
> >
> >  y_{c,i}^(k) = a_c^(k) + b_c^(k)*x_{c,i} + eps_{c,i}^(k)
> >                 \approx e_c + b_c^(k)*x_{c,i} + eps_{c,i}^(k)  (*)
> >
> > where eps_{c,i}^(k) is zero-mean noise.  By do multiscan at various
> > *PMT settings*, we can  identify e_c and all of the b_c^(k). Even
> > better, we get a good estimate of x_{c,i}, the amount of light
> > entering the PMT tube, so in the end of the day we control for effects
> > in the PMT and the electronics afterwards.  We strongly believe this
> > is a good model for those effects.
> >
> > Now, if you adjust the laser power, you effectively adjust the amount
> > of light being emitted from each probe too, that is, you can no longer
> > assume x_{c,i} being constant, but you have x_{c,i}^{m} where
> > m=1,...,M is the different *laser levels*.  You may provide a similar
> > model to (*) for laser-adjusted scans, e.g.
> >
> >  x_{c,i}^(m) \approx d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m)  (**)
> >
> > where now z_{c,i} is the amount of labels on the hybridized target on
> > probe i ,and x_{c,i}^(m) is the amount of light emitted by this probe
> > at laser level m.  One open question is if "laser offset" d_c is
> > constant or if it depends on m too.
> >
> > Now, if (**) is true, when combining (*) and (**), which are both so
> > called _affine_ functions, you will get another affine function:
> >
> >  y_{c,i}^(k) = e_c + b_c^(k)*(d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m)) +
> > eps_{c,i}^(k)
> >                 = e_c + d_c*b_c^(k) + h_c^(k,m)*z_{c,i} + nu_{c,i}^(k,m)  (***)
> >
> > where nu_{c,i}^(k,m) is confounded noise.  Compare Models (***) and
> > (*).  If d_c = 0, then (*) and (***) are similar, and you can use (*)
> > for your data.  If d_c != 0, then d_c*b_c^(k) must be estimated too.
> >
> > The Y <- calibrateMultiscan(X) in aroma.light applies to Model (*).
> > There is no implementation for Model (***) when d_c != 0, but I would
> > say give it a try.
> >
> > If you want to, I can have a look at your multiscan data for a typical
> > array.  If so, we'll have to figure out a way to transfer three GPR
> > files.
> >
> > Best
> >
> > Henrik
> >
> > >
> > > many thanks!
> > > John
> > >
> > > --
> > > John Fowler                             Associate Professor
> > > Botany and Plant Pathology (BPP) Dept.
> > > 2082 Cordley Hall                        Phone: (541) 737-5307
> > > Oregon State University                  FAX: (541) 737-3573
> > > Corvallis, OR  97331-2902  USA           Email: fowlerj at ...
> > >         [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at ...
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at ...
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>
>
> Hi Henrik,
>
> thank you very much for the rapid reply!
>
> My three scans are something like this, I don't have the exact numbers right now:
>
> 'low' scan - 80% laser power, PMT at ~350
> 'medium' scan - 80% laser power, PMT at ~400
> 'high' scan - 90% laser power, PMT at ~400
>
>
> In retrospect, I am quieting cursing at myself for changing two variables...
>
>
> Anyway, after noting your post, I went back and checked the papers by Lyng et al
> 04 and Piepho et al 06 that I had seem previously, and saw that in both cases
> they also kept the laser power constant and changed the PMT.
>
> I actually scanned some of these slides just this morning, and so have the
> opportunity to go back and re-scan them - sounds like this might be the best
> approach?  Unfortunately, there are some older slides in this experiment for
> which this is not an option.
>
> Also, I must admit that I don't follow the details of the statistical solutions
> you explained.  However, I think I grasp the gist of it.  If I try using
> calibrateMultiscan(X) with my data, how would I know that it was giving me an
> invalid output?

Sorry about all those details.  The summary is that we found that the
scanners are very much linear in its measurements except from a small
offset added, which we believe is added on purpose to avoid non-sense
negative signals. By doing "PMT" scans we can identify and correct for
this offset. If not corrected for it will add artifacts to your data.

By doing "laser-power" scans we would be able to identify another type
of offset in the scanner, which cannot be detected by PMT scans.  This
other offset may or may not be there.  If it is not there, or is much
smaller than the "PMT offset", you can safely use the
calibrateMultiscan().  But, if the "laser offset" is of the same order
or large than the "PMT offset", a specially designed calibration
method is required.  I do not know of any other methods available that
deals with this.

Having said this, I then said that you could still give it a try.  The
reason for this is that you probably will get better results than not
calibrating the data at all.  I think you can do even better though.
I have to see the data in order to be more precise.

>
> Have you looked at either of the papers I referenced above, to see what you
> think of them, and whether the approaches used in those papers would work better
> for my situation?

If you look at the graphs for the GenePix scanner of Lyng et al
(2004), you see that they also observer an offset in the scanner.  In
their paper they report that the "intensities measured ...approaching
a constant value of about 20".  (For the ScanArray scanner they
observe a negative offset, which is interesting). This is in the same
range as the offset we detected too.  They the conclude that "At spot
intensities ... below 200 (scan 2) the relationships deviated from
linearity".  The main reason for this is that there is an offset of
about 20 in the signals and below 200, that offset has a serious
impact on ratios and on the log scale, e.g. compare
M=log2((20+50)/(20+100)) = -0.78 to Mcalib=log(50/100)=-1.  However,
if you do calibrate for the scanner offset, I claim that you will get
a linear relationship between the amount of DNA in the spots and what
you measure at much weaker signals.  From what I read in Lyng et al, I
believe they would agree with this too. To deal with saturated probe
signals, I agree with the authors that the median rather than the mean
pixel intensity should be used for the probe signal.  Saturation is
take care of by our multiscan method in the sense that the estimates
are robust (I can give more arguments but that will mean even more
details).  Lyng et al do not correct for the scanner offset, which is
what you most likely have in your GenePix data.

I am less familiar with the details of Piepho et al (2006) - they
target the problem of saturation, and only mention in the discussion
that offsets could be modeled too.  I cannot remember the reference,
but there is another paper on how to correct saturated signals by
using the relationship between the mean and the median pixel
intensities.  As long as you have some scans where your spots are not
saturated, I would worry less about the saturated spots than about
scanner offset.

There are few other papers on how to combine two or more scans, but I
do not know of anyone dealing with the case where both laser and PMT
have been adjusted.

Finally, the people making scanners really know what they are doing,
the parts such as the PMTs have been around in other technologies for
many years, and they have been optimized for a long time.  I think we
can trust that the scanners are very "linear" and have large dynamical
range (much more than the rest of the microarray process).  However,
the scanner offset is there and must be corrected for, and to the best
of my knowledge it is added on purpose by the scanner manufacturer in
order to avoid nonsense (censored) negative signals (due to noise).
The reason why we see that saturated spot signals curve of as we
approach the upper limit of the scanner is most likely not due to the
scanner is not linear there, but that we are taking the average (mean
or median) of many pixels per spot; Figure 1 in Piepho et al
illustrates this nicely.


>
> Thank you for your responses, if it seems like it would be worthwhile for me to
> get you my .gpr files, and you can take the time to look at them, I think I
> should be able to figure out a way to post them someplace where you could
> download them.

Yes, it would be great if I could have a look at your laser-adjusted
scans, so I don't have to guess about the effects.

Cheers

Henrik

>
> again, thanks again for your help!
> John
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list