[BioC] RMA/QuantileNormalization results difference between oligo and aroma.affymetrix for Hugene

Fri Feb 26 16:37:11 CET 2010

I rather leave bug fixes on the RMA "engine" for Ben Bolstad. Once he
provides a fix, it makes its way to preprocessCore and the C API used
by affy and oligo.

b

On Fri, Feb 26, 2010 at 3:30 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> On Fri, Feb 26, 2010 at 2:29 PM, Benilton Carvalho
> <beniltoncarvalho at gmail.com> wrote:
>> regarding oligo's implementation of RMA, it's the same as the one in affy.
>
> Just in case there are future bug fixes etc, is it based on the same
> code base or are you utilizing/calling the code base in the affy
> package?  If, say, oligo changes/fixes something in the RMA code, is
> there a risk that the code then will be different from affy?
>
> /Henrik
>
>>
>> b
>>
>> On Fri, Feb 26, 2010 at 12:47 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
>>> Hi,
>>>
>>> you concerns about reproducibility are very important.  Luckily your
>>> observations are based on mistakes as explained below.
>>>
>>> On Fri, Feb 26, 2010 at 12:05 PM, Benilton Carvalho
>>> <beniltoncarvalho at gmail.com> wrote:
>>>> Quantile normalization is already one step in the RMA workflow.
>>>> Therefore, there's no need to normalize the data again once you've
>>>> gone RMA, ie. (regarding oligo) your call
>>>> "normalize.quantiles(exprs(rmadata))" should be dropped.
>>>>
>>>> Using the defaults, rma() in oligo will:
>>>>
>>>> 1) Background correct (via the RMA convolution model)
>>>> 2) Quantile normalize
>>>> 3) Summarize via median-polish.
>>>
>>> Yes, as Benilton points out it seems like you've misunderstood what
>>> RMA does.  The author of RMA (Ben Bolstad) defines the term RMA to
>>> mean the complete preprocessing suite including summarization.
>>>
>>>>
>>>> b
>>>>
>>>> On Fri, Feb 26, 2010 at 10:46 AM, Mikhail Pachkov <pachkov at gmail.com> wrote:
>>>>> Dear All,
>>>>>
>>>>> I am new in microarray analysis and need your expertise.
>>>>> I need to develop procedure for producing expression values from CEL
>>>>> files. Data should processed with RMA and quantile normalized. I have
>>>>> tried two packages - oligo and aroma.affymetrix. Obtained results are
>>>>> quite different form my point of view. Moreover
>>>>> aroma.affymetrix::QuantileNormalization function produce dta which do
>>>>> not look like they were quantile normalized.
>>>
>>> What is 'dta'?
>>>
>>>>>  I have made density plots of data after RMA and after quantile
>>>>> normalization which are available here
>>>>> http://test.swissregulon.unibas.ch/bioc/index.html There are also
>>>>> links to two CEL files I have used.
>>>>>
>>>>> I have a few questions:
>>>
>>> Below, I will take that you mean "RMA background correct" when you say "RMA".
>>>
>>>>> Why RMA results are so different?
>>>
>>> The RMA-style background correction in aroma.affymetrix utilizes
>>> affy::bg.adjust() [and normalizes PM probes only].  I'm not sure what
>>> algorithm/implementation oligo is using for this step, but they should
>>> give very similar corrected probe signals.
>>>
>>>>> Which RMA implementation is correct?
>>>
>>> So, aroma.affymetrix is basically just a wrapper for
>>> affy::bg.adjust(), which I think was the original implementation of
>>> RMA background correction.  I let Benilton comment on the oligo
>>> implementation and it's origin.
>>>
>>>>> Why does quantile normalization in aroma.affymetrics produce two
>>>>> different distributions?
>>>
>>> Because you first run quantile normalization on PMs only, then you
>>> look at the density plot for all (PMs & MMs).  More below.
>>>
>>>>>
>>>>> Thank you in advance!
>>>>>
>>>>> Here are R scripts I have used:
>>>>>
>>>>> ################################
>>>>> #aroma.affymetrix
>>>>> library(aroma.affymetrix);
>>>>> verbose <- Arguments$getVerbose(-8, timestamp=TRUE);
>>>>>
>>>>> # read files
>>>>> cdf <- AffymetrixCdfFile('annotationData/chipTypes/HuGene-1_0-st-v1/HuGene-1_0-st-v1.cdf');
>>>>> cs <- AffymetrixCelSet$byPath("rawData/mine/HuGene-1_0-st-v1/");
>>>
>>> Have to bring it up: Please, do not setup your annotation data and and
>>> data sets like this. An aroma.affymetrix script should not contain
>>> paths/pathnames, cf. "Dos and Don'ts":
>>>
>>>  http://aroma-project.org/node/102
>>>
>>> The correct way to do the above is:
>>>
>>> cs <- AffymetrixCelSet$byName("mine", chipType="HuGene-1_0-st-v1");
>>>
>>> alternatively, if you wish to be explicit in what CDF is used, you can do:
>>>
>>> cdf <- AffymetrixCdfFile$byChipType("HuGene-1_0-st-v1");
>>> cs <- AffymetrixCelSet$byName("mine", cdf=cdf);
>>>
>>>>>
>>>>> # RMA
>>>>> bc <- RmaBackgroundCorrection(cs);
>>>>> csBC <- process(bc,verbose=verbose);
>>>>>
>>>>> # QuantileNormalization
>>>>> qn <- QuantileNormalization(csBC, typesToUpdate="pm");
>>>>> csN <- process(qn);
>>>
>>> Note, the argument 'typesToUpdate' says that it is only PM probes that
>>> will be updated. The default is that MMs are left "as is".
>>>
>>>>>
>>>>> # Plots
>>>>> image_file <- ("aroma.affymetrix.RMA.png");
>>>>> png(image_file,width=1028,height=768);
>>>>> plotDensity(csBC);
>>>
>>> Here you are plotting all probes on the array.  Since
>>> RmaBackgroundCorrection is only correcting PM probes, you probably
>>> want to do:
>>>
>>> plotDensity(csBC, types="pm");
>>>
>>>>> title("aroma.affymetrix RMA data");
>>>>> dev.off();
>>>>>
>>>>> image_file <- ("aroma.affymetrix.QN.png");
>>>>> png(image_file,width=1028,height=768);
>>>>> plotDensity(csN);
>>>
>>> plotDensity(csN, types="pm");
>>>
>>> This is the key to why you get different density plots.  For a
>>> thorough explaination of the various QN approaches, see Page
>>> 'Empirical probe-signal densities and rank-based quantile
>>> normalization':
>>>
>>>  http://aroma-project.org/node/141
>>>
>>>>> title("aroma.affymetrix QN data");
>>>>> dev.off()
>>>
>>> What you haven't compared yet, because you misunderstood the RMA
>>> pipeline, are the summarized probe signals from fitting the
>>> log-additive RMA model.
>>>
>>> FYI, it is part of our (24 hours) redundancy testing to assert that
>>> the aroma.affymetrix RMA pipeline can reproduce the RMA pipeline and
>>> RMA summary estimates of the affyPLM package.  You can see how well
>>> this is done on Page 'Replication: RMA (background, normalization &
>>> summarization)':
>>>
>>>  http://www.aroma-project.org/replication/RMA
>>>
>>> Hope this helps.
>>>
>>> Henrik
>>>
>>>>> ################################
>>>>>
>>>>> ################################
>>>>> # oligo
>>>>> library(oligo);
>>>>> rawdata=read.celfiles(c("rawData/mine/HuGene-1_0-st-v1/sample1.CEL","rawData/mine/HuGene-1_0-st-v1/sample2.CEL"));
>>>>> rmadata=rma(rawdata);
>>>>> qndata=normalize.quantiles(exprs(rmadata))
>>>>>
>>>>> library(affy)
>>>>> # Plots
>>>>> image_file <- ("oligo.RMA.png");
>>>>> png(image_file,width=1028,height=768);
>>>>> plotDensity(exprs(rmadata));
>>>>> title("oligo RMA data");
>>>>> dev.off();
>>>>>
>>>>> image_file <- ("oligo.QN.png");
>>>>> png(image_file,width=1028,height=768);
>>>>> plotDensity(qndata);
>>>>> title("oligo QN data");
>>>>> dev.off()
>>>>> ###############################
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Mikhail Pachkov
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>
>