[BioC] RMA question

Harbron, Chris Chris.Harbron at astrazeneca.com
Mon Dec 18 16:19:50 CET 2006

Hi James,

Can I point you in the direction of the RefPlus package available in
Bioconductor release 2.0, which will do what I think you are looking
for, i.e. allowing additional cel files to be added into a data set
without affecting the gene expression or the normalisation parameters
calculated from the previously processed cel files. 

You might also want to check out the paper from Darlene Goldstein in
Bioinformatics (2006 p2364-2372) which discusses similar algorithms.

All the best


Chris Harbron
Technical Lead Statistician,

  I have a question for RMA normalization. Since RMA is an across sample
normalization, suppose I have 50 training samples (cel files) and 50
test samples (cel files). There are two ways to perform normalization:
1. Combine all the 100 samples together and use RMA to do normalization.
Then train the training set of 50 samples to classify the 50 test
2. Use the 50 training samples to do RMA, then each cel file is
converted to gene expression vector. Suppose the mapping from cel file
to expression vector is:
Expression = f(cel). The form of f is determined by the 50 training cel
files. Then apply the same mapping to the test cel files. 

  I would think method 2 is more reasonable and trully blind. However,
it is not clear how to determine the function f from the 50 training cel
files. method 1 is easy to implement, but it is not trully blind, since
the normalization of cel files from training samples actually utilized
the information from test cel files. 
  Could anybody tell me how to determine the function f from the 50
training cel files? 

  Many thanks,

More information about the Bioconductor mailing list