[BioC] GCRMA-induced correlations?

Wed Feb 20 17:33:41 CET 2008

On Wed, Feb 20, 2008 at 5:20 AM, Wolfgang Huber <huber at ebi.ac.uk> wrote:
>
>  Hi Henrik,
>
>  a similar phenomenon (discreteness of low intensity values) occurs with
>  image analysis software for other microarray platforms as well, see e.g.
>  the data in the CCl4 package. Just adding 'random noise' to the data
>  seems pretty unsatisfactory. Results may or may not be fine in practice,
>  but for a "model-based" method I'd think the right thing to do is to
>  either fix your model, or the estimation algorithm, or to diagnose lack
>  of model fit; but perhaps not surreptitiously tweaking the data to make
>  it look like you think it ought to.

I second this.

>
>  I realize that this can be a difficult goal in practice, but I like
>  functions/packages better that avoid ad hoc heuristics only apparent
>  when reading the code, and do what the label (e.g. the accompanying
>  methods paper) says.

I second this one too.

In our case with RMA norm+exp correction, the fix was done to "get
this done" and the one who added it was happy enough with the results
(without it the model fit fails).  As a protection the user has to use
'addJitter=TRUE' to turn it on.

I can see how the real fix falls between the chairs: the original RMA
norm+exp was written for a chip type that did not produce
low-intensity discrete signals and everything worked fine and now
someone gets around and wants to use it for a new chip type with
slightly new properties.  There is also a catch 22 for
aroma.affymetrix here: people ask for perfect reproducibility of the
existing RMA bg correction (and it does), cf. this thread, but that
then also perfectly reproduce the discreteness problem.  To not
provide the bg correction method is not an option.  The 'addJitter'
provides an ad-hoc patch for the problem until someone(?) has time to
come up with a better solution.  It all comes down to time, ehe.

...and that is how yet another strange option was born.

Cheers

/Henrik

>
>   Best wishes
>         Wolfgang
>
>
>
>
>  Henrik Bengtsson wrote:
>  > Hi,
>  >
>  > another reason for adding "some noise" is to help the estimation
>  > algorithm to converge when the discreteness of the data dominates at
>  > lower intensities.
>  >
>  > Details: By default, Affymetrix takes the 75% quantile of the pixel
>  > intensities to be the probe signal, which mean if you've got 9 pixels
>  > (common with new chip types) that becomes *exactly* the 7:th pixel
>  > value.  In other words, the pixel intensities observed in a CEL file
>  > are often "integers" (although they are stored as floats).  At low
>  > intensities this this discreteness dominates, which you can see as a
>  > "peacock tail" if you do a log-ratio log-intensity plot.
>  >
>  > We observed convergence problems for the RMA norm+exp background model
>  > for some data sets (exon arrays; 9 pixels/probe, low intensities)
>  > because of the above.  In order to help out, we have the option to add
>  > "jitter" before fitting the model (in the 'RmaBackgroundCorrection' of
>  > aroma.affymetrix), which seems to help.
>  >
>  > Cheers
>  >
>  > Henrik
>  >
>  >
>  > On Feb 19, 2008 11:56 PM, Pierre Neuvial <pierre.neuvial at curie.fr> wrote:
>  >> Hi Zhijin,
>  >>
>  >> In Lim's paper they also suggest to add some noise to truncated probes: I believe (and this is my experience as well) that otherwise they would have exactly the same signal values for truncated probes, and correlations between low intensity probes would remain...
>  >>
>  >> Quoting Lim's paper,
>  >>
>  >> "To test our speculations, we reimplemented the GCRMA procedure without adjusting GSB for uninformative probes-i.e. probes that are truncated to m after NSB adjustment. To ensure the lowest intensity rank of these probes, any other probes with GSB-adjusted value less than m were also truncated at m. Finally, an infinitesimal amount of uniformly distributed noise was added to truncated probes to avoid rank-order correlation issues."
>  >>
>  >> Do you plan to add this "noise" as well ? If so, how should the noise level be chosen ? And how about reproducibility of the results of GCRMA ? I think this particular issue is related to the recent thread about set.seed() in GCRMA.
>  >>
>  >> Best wishes,
>  >>
>  >> Pierre.
>  >>
>  >>
>  >> Zhijin Wu a écrit :
>  >>
>  >>> Yes, to eliminate this artifact The truncated values will no longer be
>  >>> adjusted in the next release of GCRMA.
>  >>>
>  >>> Jenny Drnevich wrote:
>  >>>> Hi Zhijin,
>  >>>>
>  >>>> A client pointed out a July 2007 article by Lim et al. testing different
>  >>>> normalization/pre-processing methods for their effects on pairwise
>  >>>> correlations between probesets (Bioinformatics 2007 23(13):i282-i288;
>  >>>> doi:10.1093/bioinformatics/btm201; full link below). They reported that
>  >>>> GCRMA introduced severe artificial correlations between probesets; they
>  >>>> looked for a cause and think it's due truncation of low-intensity values
>  >>>> after Non-Specific Binding adjustment and then the Gene-Specific Binding
>  >>>> adjustment on these truncated values. They also tested a specific
>  >>>> correction to the GCRMA algorithm that appears to prevent the artificial
>  >>>> correlation and suggest that it become an option or even a default in
>  >>>> the R implementation of GCRMA.
>  >>>>
>  >>>> What do you think of this article? Are there any plans to implement
>  >>>> their suggestion?
>  >>>>
>  >>>> Thanks,
>  >>>> Jenny
>  >>>>
>  >>>> Comparative analysis of microarray normalization procedures: effects on
>  >>>> reverse engineering gene networks
>  >>>>
>  >>>> http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i282?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorexacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortspec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf
>  >>>>
>  >>>>
>  >>>>
>  >>>> <http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i282?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorexacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortspec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf>
>  >>>>
>  >>>> Jenny Drnevich, Ph.D.
>  >>>>
>  >>>> Functional Genomics Bioinformatics Specialist
>  >>>> W.M. Keck Center for Comparative and Functional Genomics
>  >>>> Roy J. Carver Biotechnology Center
>  >>>> University of Illinois, Urbana-Champaign
>  >>>>
>  >>>> 330 ERML
>  >>>> 1201 W. Gregory Dr.
>  >>>> Urbana, IL 61801
>  >>>> USA
>  >>>>
>  >>>> ph: 217-244-7355
>  >>>> fax: 217-265-5066
>  >>>> e-mail: drnevich at uiuc.edu
>  >>>>
>  >>>
>  >>
>  >> _______________________________________________
>  >> Bioconductor mailing list
>  >> Bioconductor at stat.math.ethz.ch
>  >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>  >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>  >>
>  >
>  > _______________________________________________
>  > Bioconductor mailing list
>  > Bioconductor at stat.math.ethz.ch
>  > https://stat.ethz.ch/mailman/listinfo/bioconductor
>  > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>  --
>  Best wishes
>   Wolfgang
>
>  ------------------------------------------------------------------
>  Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>
>