[BioC] RMA/median polish question

James MacDonald jmacdon at med.umich.edu
Thu Sep 25 15:31:16 MEST 2003

Hi All,

I have a question about the implementation of medpolish in RMA. The
algorithm involves repeated subtractions of row medians and column
medians from a matrix of probe intensity values.

The problem I have noticed is that if you have an odd number of chips
(especially if you only have three chips), you will end up with an
inordinantly high percentage of expression values that are identical in
all three chips. We are talking about 25-30% of the genes when using
three chips. This is due to the fact that the initial subtraction of row
medians results in so many zeros in the matrix that the column medians
are then zero. Since the expression value is the overall median plus the
column median for that chip, the expression value for that gene will be
identical for all chips.

If we change the median polish algorithm to subtract column medians
first, we don't have this problem, and the expression values are not
much different from what we get using the usual algorithm.

Now I realize that this is more of a philosophical problem rather than
a real problem, because it is unlikely that any of  the expression
values we are talking about would be considered 'differentially
expressed'. However, this does appear to me to be an unintended
consequence of using the current implementation, and the fix is a
trivial change in the code for RMA:

In the function median_polish, change

for (iter = 1; iter <= maxiter; iter++){
get_row_median(z,rdelta,nprobes,cols);
subtract_by_row(z,rdelta,nprobes,cols);
rmod(r,rdelta,nprobes);
delta = median(c,cols);
for (j = 0; j < cols; j++){
c[j] = c[j] - delta;
}
t = t + delta;
get_col_median(z,cdelta,nprobes,cols);
subtract_by_col(z,cdelta,nprobes,cols);
cmod(c,cdelta,cols);
delta = median(r,nprobes);
for (i =0; i < nprobes; i ++){
r[i] = r[i] - delta;
}
t = t+delta;
newsum = sum_abs(z,nprobes,cols);
if (newsum == 0.0 || fabs(1.0 - oldsum/newsum) < eps)
break;
oldsum = newsum;

To

for (iter = 1; iter <= maxiter; iter++){
get_col_median(z,cdelta,nprobes,cols);
subtract_by_col(z,cdelta,nprobes,cols);
cmod(c,cdelta,cols);
delta = median(r,nprobes);
for (i =0; i < nprobes; i ++){
r[i] = r[i] - delta;
}
t = t + delta;
get_row_median(z,rdelta,nprobes,cols);
subtract_by_row(z,rdelta,nprobes,cols);
rmod(r,rdelta,nprobes);
delta = median(c,cols);
for (j = 0; j < cols; j++){
c[j] = c[j] - delta;
}
t = t+delta;
newsum = sum_abs(z,nprobes,cols);
if (newsum == 0.0 || fabs(1.0 - oldsum/newsum) < eps)
break;
oldsum = newsum;

This appears to me to be a reasonable thing to do, but I am curious
what others think.

Regards,

Jim

James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623