[BioC] median polish vs mas

Mon May 17 18:25:13 CEST 2004

Dear Naomi,

I think we are talking about two different things here. Your question
appears to be whether or not rma is a reasonable method for computing
expression values, and you appear not to distinguish between justRMA and
rma. My statement is directed towards the purpose of justRMA.

To answer your question, I personally like rma, and I am not convinced
that there is any over-normalization occuring by doing a quantile
normalization followed by medianpolish. I have tried pretty much
everything out there, and I have yet to find a method for computing
expression values that I think does a better job in general use. This is
based primarily on how well a given method works with the affy spike-in
and GeneLogic dilution data sets (I have had arguments with other
statisticians who think that rma only works as well as it does with
these data sets because it has been specifically 'tuned' for them. If
so, my hat is off to Rafael and Ben for their ability to come up with an
algorithm that can magically pick the 16 spiked-in genes out of the
other 18,000 or so other genes...).

For a variety of reasons, not the least of which is the fact that rma
'beats' most other methods, rma has sort of become the canonical method
for computing expression values for Affy data. It has been implemented
in other non-BioC packages such as GeneSpring, etc, and although I
haven't seen anything concrete, I would bet dollars to donuts that the
Affy PLIER algorithm is simply rma by another name. I think this is why
your reviewer wants to know why you are doing quantile normalization
followed by Tukey's biweight instead of what he/she would consider to be
the 'usual' method.

Now to the point I was originally trying to make. One of the problems
that people encounter with rma is the fact that you first have to create
an AffyBatch with all of your chips, and then compute expression values
which are stored in an exprSet. This can take a huge amount of RAM, and
people with maybe 512 Mb of RAM (which is plenty for the vast majority
of things you will ever do on a computer) were running out of memory
with a relatively small number of chips. Rafael noted that a
modification could be made to rma that would use much less memory, and
with his help I wrote the original justRMA. This function was designed
for one purpose only; to allow people with less RAM to be able to do
rma.

The decision to use medianpolish wasn't arbitrary at all; justRMA is
designed to give the exact same results as rma (which of course uses
medianpolish to compute expression values), so by default I had to use
medianpolish.

Best,

Jim

James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

>>> Naomi Altman <naomi at stat.psu.edu> 05/17/04 11:33AM >>>
Dear Jim,

The reason I ask is that I have been using expresso with "mas".  But I
recently had a paper returned with the comment that median polish was
"known to be better".  If so, I should probably use it.  The reviewer
appears to have based his/her remarks on the fact (mentioned in the
review) that median polish is the "default".

If the decision to use median polish in justRMA was arbitrary, I would
like to know this, since I am currently in the process of redoing all of
the statistical analyses and tables in the paper (which is pretty
time-consuming).  The main reason we are redoing everything, rather than
defending our decision to use "mas" is that I certainly have no evidence
that Tukey's biweight is "better" except for the heuristic about
over-normalization, and I figured in the long run we will have fewer
arguments with reviewers if we use the default. 

I should not have said that median polish is the "default" in justRMA,
since it is the only method available, but I do think that its use in
justRMA is an endorsement meaning that anyone doing anything besides
Affy-type MAS5 or justRMA or justGCRMA (if this is available) is going
to be asked to justify what they are doing with more stringency.

--Naomi

At 10:49 AM 5/17/2004, James MacDonald wrote:
The default (and only) option for justRMA is medianpolish because
justRMA is designed to *just* do *RMA*, which is a quantile
normalization followed by medianpolish. The only reason justRMA exists
is to allow people with less RAM to be able to do rma.

If you think a quantile normalization followed by Tukey's biweight
will
do better than rma, you can certainly do that using the expresso()
function.

Best,

Jim

James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

>>> Naomi Altman <naomi at stat.psu.edu> 05/17/04 10:15AM >>>
I have been wondering why the default in justRMA is 
summary.method="medianpolish"  instead of "mas" which is Tukey's 
biweight.  Since we are already doing quantile normalization, doesn't
the 
extra between array step imposed by median polish give the possibility
of 
masking differential expression?

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348
(Statistics)
University Park, PA 16802-2111

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor 

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman                                814-865-3791 (voice) 
Associate Professor 
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax) 
Penn State University                         814-865-1348 (Statistics)

University Park, PA 16802-2111