[BioC] RMA question
kostka at molgen.mpg.de
Mon Dec 18 11:54:09 CET 2006
briefly, to make new chips comparable to a training data set normalized
with RMA you can do the following:
normalize your training arrays keeping track of:
(1) the means over the ranks used in quantile normalization
(2) the probe effects estimated by the median polish procedure
as the background correction is performed chip-by-chip, you can
transform each test (future) array to be compatible to the training
arrays (and the classifier) with the above information. f() then works
roughly like that:
* substitute the (ranked) test-expression values by the means over the
ranks from (1) (you're normalized now)
* calculate a chip-effect (for each probe set) via subtracting the
probe effect from (2) from each probe set (you're done now)
i can send you the code for the above, in case you are interested.
all the best,
Naomi Altman wrote:
> I would say that it depends on how you plan to use the classification function.
> If, in future, you will collect more samples, and use the
> classification function to classify them, then you need to normalize
> the test set the same way you will normalize the new arrays.
> How you plan to do this may also affect how you normalize the training set.
> At 02:53 PM 12/17/2006, Wolfgang Huber wrote:
>> Hi James,
>> this is a general problem of normalization methods that work by adapting
>> arrays in a set to themselves, and not to an independent reference.
>> Option 1 is indeed discredited when you want to get a fair estimate of
>> classification rates, since it does not faithfully simulate the real
>> application where you want to classify a new sample.
>> Option 2 does not work since f contains for each array a number of
>> array-specific, ideosyncratic parameters that reflect hybridization
>> conditions, labeling efficiency, RNA extraction etc. You cannot "learn"
>> them in advance.
>> The option I'd take is to look for a normalization method that
>> normalizes each new array individually (or in sets appropriate to your
>> intended application) to an existing database of reference arrays. I
>> know that various people on this list have been/are working on such
>> methods. But I am probably not up-to-date myself - maybe someone can
>> Best wishes
>> Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
>>> Hi, I have a question for RMA normalization. Since RMA is an across
>> normalization, suppose I have 50 training samples (cel files) and 50
>> test samples (cel files). There are two ways to perform normalization:
>>> 1. Combine all the 100 samples together and use RMA to do
>> normalization. Then train the training set of 50 samples to classify the
>> 50 test samples.
>>> 2. Use the 50 training samples to do RMA, then each cel file is
>> converted to gene expression vector. Suppose the mapping from cel file
>> to expression vector is:
>>> Expression = f(cel). The form of f is determined by the 50 training
>> cel files. Then apply the same mapping to the test cel files.
>>> I would think method 2 is more reasonable and trully blind. However,
>> it is not clear how to determine the function f from the 50 training cel
>> files. method 1 is easy to implement, but it is not trully blind, since
>> the normalization of cel files from training samples actually utilized
>> the information from test cel files.
>>> Could anybody tell me how to determine the function f from the 50
>> training cel files?
>>> Many thanks, James
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor