[BioC] RMA/QuantileNormalization results difference between oligo and aroma.affymetrix for Hugene

Fri Feb 26 16:30:57 CET 2010

On Fri, Feb 26, 2010 at 2:29 PM, Benilton Carvalho
<beniltoncarvalho at gmail.com> wrote:
> regarding oligo's implementation of RMA, it's the same as the one in affy.

Just in case there are future bug fixes etc, is it based on the same
code base or are you utilizing/calling the code base in the affy
package?  If, say, oligo changes/fixes something in the RMA code, is
there a risk that the code then will be different from affy?

/Henrik

>
> b
>
> On Fri, Feb 26, 2010 at 12:47 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
>> Hi,
>>
>> you concerns about reproducibility are very important.  Luckily your
>> observations are based on mistakes as explained below.
>>
>> On Fri, Feb 26, 2010 at 12:05 PM, Benilton Carvalho
>> <beniltoncarvalho at gmail.com> wrote:
>>> Quantile normalization is already one step in the RMA workflow.
>>> Therefore, there's no need to normalize the data again once you've
>>> gone RMA, ie. (regarding oligo) your call
>>> "normalize.quantiles(exprs(rmadata))" should be dropped.
>>>
>>> Using the defaults, rma() in oligo will:
>>>
>>> 1) Background correct (via the RMA convolution model)
>>> 2) Quantile normalize
>>> 3) Summarize via median-polish.
>>
>> Yes, as Benilton points out it seems like you've misunderstood what
>> RMA does.  The author of RMA (Ben Bolstad) defines the term RMA to
>> mean the complete preprocessing suite including summarization.
>>
>>>
>>> b
>>>
>>> On Fri, Feb 26, 2010 at 10:46 AM, Mikhail Pachkov <pachkov at gmail.com> wrote:
>>>> Dear All,
>>>>
>>>> I am new in microarray analysis and need your expertise.
>>>> I need to develop procedure for producing expression values from CEL
>>>> files. Data should processed with RMA and quantile normalized. I have
>>>> tried two packages - oligo and aroma.affymetrix. Obtained results are
>>>> quite different form my point of view. Moreover
>>>> aroma.affymetrix::QuantileNormalization function produce dta which do
>>>> not look like they were quantile normalized.
>>
>> What is 'dta'?
>>
>>>>  I have made density plots of data after RMA and after quantile
>>>> normalization which are available here
>>>> http://test.swissregulon.unibas.ch/bioc/index.html There are also
>>>> links to two CEL files I have used.
>>>>
>>>> I have a few questions:
>>
>> Below, I will take that you mean "RMA background correct" when you say "RMA".
>>
>>>> Why RMA results are so different?
>>
>> The RMA-style background correction in aroma.affymetrix utilizes
>> affy::bg.adjust() [and normalizes PM probes only].  I'm not sure what
>> algorithm/implementation oligo is using for this step, but they should
>> give very similar corrected probe signals.
>>
>>>> Which RMA implementation is correct?
>>
>> So, aroma.affymetrix is basically just a wrapper for
>> affy::bg.adjust(), which I think was the original implementation of
>> RMA background correction.  I let Benilton comment on the oligo
>> implementation and it's origin.
>>
>>>> Why does quantile normalization in aroma.affymetrics produce two
>>>> different distributions?
>>
>> Because you first run quantile normalization on PMs only, then you
>> look at the density plot for all (PMs & MMs).  More below.
>>
>>>>
>>>> Thank you in advance!
>>>>
>>>> Here are R scripts I have used:
>>>>
>>>> ################################
>>>> #aroma.affymetrix
>>>> library(aroma.affymetrix);
>>>> verbose <- Arguments$getVerbose(-8, timestamp=TRUE);
>>>>
>>>> # read files
>>>> cdf <- AffymetrixCdfFile('annotationData/chipTypes/HuGene-1_0-st-v1/HuGene-1_0-st-v1.cdf');
>>>> cs <- AffymetrixCelSet$byPath("rawData/mine/HuGene-1_0-st-v1/");
>>
>> Have to bring it up: Please, do not setup your annotation data and and
>> data sets like this. An aroma.affymetrix script should not contain
>> paths/pathnames, cf. "Dos and Don'ts":
>>
>>  http://aroma-project.org/node/102
>>
>> The correct way to do the above is:
>>
>> cs <- AffymetrixCelSet$byName("mine", chipType="HuGene-1_0-st-v1");
>>
>> alternatively, if you wish to be explicit in what CDF is used, you can do:
>>
>> cdf <- AffymetrixCdfFile$byChipType("HuGene-1_0-st-v1");
>> cs <- AffymetrixCelSet$byName("mine", cdf=cdf);
>>
>>>>
>>>> # RMA
>>>> bc <- RmaBackgroundCorrection(cs);
>>>> csBC <- process(bc,verbose=verbose);
>>>>
>>>> # QuantileNormalization
>>>> qn <- QuantileNormalization(csBC, typesToUpdate="pm");
>>>> csN <- process(qn);
>>
>> Note, the argument 'typesToUpdate' says that it is only PM probes that
>> will be updated. The default is that MMs are left "as is".
>>
>>>>
>>>> # Plots
>>>> image_file <- ("aroma.affymetrix.RMA.png");
>>>> png(image_file,width=1028,height=768);
>>>> plotDensity(csBC);
>>
>> Here you are plotting all probes on the array.  Since
>> RmaBackgroundCorrection is only correcting PM probes, you probably
>> want to do:
>>
>> plotDensity(csBC, types="pm");
>>
>>>> title("aroma.affymetrix RMA data");
>>>> dev.off();
>>>>
>>>> image_file <- ("aroma.affymetrix.QN.png");
>>>> png(image_file,width=1028,height=768);
>>>> plotDensity(csN);
>>
>> plotDensity(csN, types="pm");
>>
>> This is the key to why you get different density plots.  For a
>> thorough explaination of the various QN approaches, see Page
>> 'Empirical probe-signal densities and rank-based quantile
>> normalization':
>>
>>  http://aroma-project.org/node/141
>>
>>>> title("aroma.affymetrix QN data");
>>>> dev.off()
>>
>> What you haven't compared yet, because you misunderstood the RMA
>> pipeline, are the summarized probe signals from fitting the
>> log-additive RMA model.
>>
>> FYI, it is part of our (24 hours) redundancy testing to assert that
>> the aroma.affymetrix RMA pipeline can reproduce the RMA pipeline and
>> RMA summary estimates of the affyPLM package.  You can see how well
>> this is done on Page 'Replication: RMA (background, normalization &
>> summarization)':
>>
>>  http://www.aroma-project.org/replication/RMA
>>
>> Hope this helps.
>>
>> Henrik
>>
>>>> ################################
>>>>
>>>> ################################
>>>> # oligo
>>>> library(oligo);
>>>> rawdata=read.celfiles(c("rawData/mine/HuGene-1_0-st-v1/sample1.CEL","rawData/mine/HuGene-1_0-st-v1/sample2.CEL"));
>>>> rmadata=rma(rawdata);
>>>> qndata=normalize.quantiles(exprs(rmadata))
>>>>
>>>> library(affy)
>>>> # Plots
>>>> image_file <- ("oligo.RMA.png");
>>>> png(image_file,width=1028,height=768);
>>>> plotDensity(exprs(rmadata));
>>>> title("oligo RMA data");
>>>> dev.off();
>>>>
>>>> image_file <- ("oligo.QN.png");
>>>> png(image_file,width=1028,height=768);
>>>> plotDensity(qndata);
>>>> title("oligo QN data");
>>>> dev.off()
>>>> ###############################
>>>>
>>>> Kind regards,
>>>>
>>>> Mikhail Pachkov
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>