[BioC] Terminology was RE: RMA normalization

Fri Sep 17 00:01:19 CEST 2004

Wolgang,

w.huber at dkfz-heidelberg.de wrote:
> Hi Ben,
> 
> 
>>I hate to be pedantic, but really people should be careful about how
>>they utilize the term "normalization".  ...
> 
> 
> I fully agree with your posting, I didn't intend to equate RMA or GCRMA
> with just normalization and I had hoped that everybody on this list was
> aware that Affymetrix preprocessing involves more than just
> "normalization". But these other aspects weren't the point of our posting,
> which equally applies to vsn, loess, whatever.
> 
> In fact I think "normalization" is not a very useful term at all: what we
> do there has nothing to do with the normal distribution, and I don't see
> what meaning the word root "normal" has in there.

Regarding semantics (too), bioinformatics can be a real headache. The 
meaning of 'normal' in this context is not the one used in statistics
(nor the one used in geometry).
This might have more to do with the "normal" used by chemists, or 
geologists. The aim of this particular step is to 
'reduce'/transform/(pre-)process the signal in such way that effects 
like scanner settings, differences in amount total labelled targets, are
corrected. As problems are better understood, one of which being the 
case where a majority of the genes are suspected to be differentially
expressed, this pre-processing step can consists in 'tweaking' the data 
with other explicit objectives in mind (variance stabilization being one 
example). As you say below, the word is becoming used to describe what 
is no longer necessarily the primary objective of the transformation.

> The word is often used in muddled way to mean all sorts of pre-processing.
> But again, "pre-processing" is not a particularly precise or intuitive
> term either.

I like 'pre-processing'. Althought it is not extremely precise, I find 
it reasonably intuitive: the prefix 'pre' indicates that this is 
something done early in the process...

> The problem I see is that the different aspects of pre-processing are not
> independent of each other; as soon as you start slicing up the problem of
> pre-processing in different sub-steps, that already involves
> approximations and presumptions about how to solve the problem. In the
> affy package / expresso method you (and Rafa & Laurent & others) have come
> up with a great and extremely useful way of slicing up the problem, but of
> course that's not the end of the story (as I understand, does the
> continuing work on methods like affyPLM indicate.)

An object model (in the computer sense) was proposed to let end-users 
perform pre-processing, and at the same time people interested in the 
methods to explore new approaches for preprocessing methods.
However this is only a model, and it clearly has limitations... 
implementing the PDNN thing for the affy package required a more than 
usual number of programming tricks (one of which being the fishing a 
variable in an enclosing frame using dynamical scoping). As new trends 
in pre-processing appear, we will see clearer how to modify the objects 
structure in order to allow people to implement easily new approaches 
(we also wish people can implement directly using bioconductor, not us 
implement from their papers).

L.

> Best wishes
>  Wolfgang
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>