[BioC] GCRMA-induced correlations?

Wolfgang Huber huber at ebi.ac.uk
Wed Feb 20 14:20:04 CET 2008

Hi Henrik,

a similar phenomenon (discreteness of low intensity values) occurs with
image analysis software for other microarray platforms as well, see e.g.
the data in the CCl4 package. Just adding 'random noise' to the data
seems pretty unsatisfactory. Results may or may not be fine in practice,
but for a "model-based" method I'd think the right thing to do is to
either fix your model, or the estimation algorithm, or to diagnose lack
of model fit; but perhaps not surreptitiously tweaking the data to make
it look like you think it ought to.

I realize that this can be a difficult goal in practice, but I like
functions/packages better that avoid ad hoc heuristics only apparent
when reading the code, and do what the label (e.g. the accompanying
methods paper) says.

 Best wishes

Henrik Bengtsson wrote:
> Hi,
> another reason for adding "some noise" is to help the estimation
> algorithm to converge when the discreteness of the data dominates at
> lower intensities.
> Details: By default, Affymetrix takes the 75% quantile of the pixel
> intensities to be the probe signal, which mean if you've got 9 pixels
> (common with new chip types) that becomes *exactly* the 7:th pixel
> value.  In other words, the pixel intensities observed in a CEL file
> are often "integers" (although they are stored as floats).  At low
> intensities this this discreteness dominates, which you can see as a
> "peacock tail" if you do a log-ratio log-intensity plot.
> We observed convergence problems for the RMA norm+exp background model
> for some data sets (exon arrays; 9 pixels/probe, low intensities)
> because of the above.  In order to help out, we have the option to add
> "jitter" before fitting the model (in the 'RmaBackgroundCorrection' of
> aroma.affymetrix), which seems to help.
> Cheers
> Henrik
> On Feb 19, 2008 11:56 PM, Pierre Neuvial <pierre.neuvial at curie.fr> wrote:
>> Hi Zhijin,
>> In Lim's paper they also suggest to add some noise to truncated probes: I believe (and this is my experience as well) that otherwise they would have exactly the same signal values for truncated probes, and correlations between low intensity probes would remain...
>> Quoting Lim's paper,
>> "To test our speculations, we reimplemented the GCRMA procedure without adjusting GSB for uninformative probes-i.e. probes that are truncated to m after NSB adjustment. To ensure the lowest intensity rank of these probes, any other probes with GSB-adjusted value less than m were also truncated at m. Finally, an infinitesimal amount of uniformly distributed noise was added to truncated probes to avoid rank-order correlation issues."
>> Do you plan to add this "noise" as well ? If so, how should the noise level be chosen ? And how about reproducibility of the results of GCRMA ? I think this particular issue is related to the recent thread about set.seed() in GCRMA.
>> Best wishes,
>> Pierre.
>> Zhijin Wu a écrit :
>>> Yes, to eliminate this artifact The truncated values will no longer be
>>> adjusted in the next release of GCRMA.
>>> Jenny Drnevich wrote:
>>>> Hi Zhijin,
>>>> A client pointed out a July 2007 article by Lim et al. testing different
>>>> normalization/pre-processing methods for their effects on pairwise
>>>> correlations between probesets (Bioinformatics 2007 23(13):i282-i288;
>>>> doi:10.1093/bioinformatics/btm201; full link below). They reported that
>>>> GCRMA introduced severe artificial correlations between probesets; they
>>>> looked for a cause and think it's due truncation of low-intensity values
>>>> after Non-Specific Binding adjustment and then the Gene-Specific Binding
>>>> adjustment on these truncated values. They also tested a specific
>>>> correction to the GCRMA algorithm that appears to prevent the artificial
>>>> correlation and suggest that it become an option or even a default in
>>>> the R implementation of GCRMA.
>>>> What do you think of this article? Are there any plans to implement
>>>> their suggestion?
>>>> Thanks,
>>>> Jenny
>>>> Comparative analysis of microarray normalization procedures: effects on
>>>> reverse engineering gene networks
>>>> http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i282?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorexacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortspec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf
>>>> <http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i282?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorexacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortspec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf>
>>>> Jenny Drnevich, Ph.D.
>>>> Functional Genomics Bioinformatics Specialist
>>>> W.M. Keck Center for Comparative and Functional Genomics
>>>> Roy J. Carver Biotechnology Center
>>>> University of Illinois, Urbana-Champaign
>>>> 330 ERML
>>>> 1201 W. Gregory Dr.
>>>> Urbana, IL 61801
>>>> USA
>>>> ph: 217-244-7355
>>>> fax: 217-265-5066
>>>> e-mail: drnevich at uiuc.edu
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Best wishes

Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

More information about the Bioconductor mailing list