[BioC] Question about patchwork affy pre-processing
izmirlian at nih.gov
Mon Jun 11 22:01:56 CEST 2007
I'm involved in an experiment using affy hgu133 plus2 arrays.
I have affy, gcrma, and other relevent libraries up and running
on my linux system.
I preprocessed using the 'threestep' function in the
gcrma library, using the following settings
normalize.method = "quantile.robust"
summary.method = "median.polish"
background.method = "GCRMA"
My question is this. Someone suggested that their biostatistician
usually preprocesses via RMA and then merges MAS-5 present/absent
calls into the resulting dataframe, which are used to omit genes with MAS-5
absent calls from any further analysis.
My feeling is that MAS-5.0 is inferior on the three steps mentioned above,
and if present/absent calls are based upon inferior techniques they should not
be used. I also believe that people are moving away from what I view as
a hidden level of filtering. It is my belief that the best way to do
filtering is once at the stage of the analysis.
Am I right in thinking that this is a bad idea.
I have followed the debate on pm only and in my mind the developement of GCRMA
now allows an efficient way to model mm's so that background correction can be
done without doubling the per gene noise.
Definitely the normalization, background correction and summary methods of
'three-step' are all the result of research that has applied the best
statistical principles in lieu of rather ad-hoc techniques contained in
suceeded in refining the methods of MAS-5
My read of the literature and best practice tells me that this is not really a
preferable way to do things
More information about the Bioconductor