[BioC] Re: Use of RMA in increasingly-sized datasets

Mon Jun 6 15:27:21 CEST 2005

Hi Kawai,

I think you might be able to generate a dummy training
set from Published microarray data!!

Swati
-------------------------------------------------------

I am in just the same problem as yours now.

I think there are two key steps in the RMA that depend
on the set of 
chips
in a run. One is quantile normalization step, and the
other is median
polish summarization step. The target value of each
quantile of probe
intensity is the geometrical mean calculated from the
same qunatiles 
across
the entire chip set in the run. And the expression
values summarized 
from
11-20 probe intensities are calculated from median
polish algorithm 
using
the probe sets across the entire chip set.

Therefore, the suggestion of the usage of "a standard
training 50 chip 
set"
is effective in practice, because the fluctuation of
quantile target 
value
is quite a little after adding one chip data to 50
chip standard set, 
and
the median values used in the summarization step are
robust enough for 
the
51 chip data set.

But this method is very tedious when we process
several chip data one 
by
one, and to create the standard set is impossible at
the beginning of a
project.

I am looking forward to hearing some good solution on
this problem, 
too.

Bye.
Kawai