[BioC] Use of RMA in increasingly-sized datasets
KiplingD at cardiff.ac.uk
Fri Jun 3 10:07:46 CEST 2005
This is not a "how do I process 1000 chips with RMA" but rather
something slightly different.
We're starting to get projects coming thru our Affy core that involve
1000+ chips. Obviously we can use MAS5 to process the .cel files, and
irrespective of what happens with subsequent chips in the project the
expression values from those chips will stay the same because of the
single-chip nature of the algorithm.
It would be nice to run, in parallel, RMA-style processing of the data.
The issue this raises for me relates to the desire of the scientists
to look at their data before the end of the project (e.g. you'd want to
explore the first 200 cancer samples rather than wait for all 1000 to
be done), which is understandable. My concern is that the multi-chip
nature of RMA means that, for any specific .cel file, the expression
values will depend on the other chips included in the run, and so the
expression values from that .cel file will be different in the early
stages (200 chips) and at the end (1000 chips). Such a 'moving target'
dataset may be confusing and would certainly cause an audit headache.
Has anyone explored this issue and proposed a solution? It's entirely
possible that I am being totally paranoid and that after 100+ chips in
a dataset the expression values plateau out and are stable in the face
of additional .cel files being included; I don't yet have access to
big-enough datasets to critically address that. I do have some
recollection in the deep mists of time a comment (?from Ben Bolstad?)
suggesting the use of a standard 'training set' of (say) 50 chips, to
which you would add your new chips one at a time and process.
All comments, thoughts, or experiences gratefully received!
Prof David Kipling
Department of Pathology
School of Medicine
Cardiff CF14 4XN
Tel: 029 2074 4847
Email: KiplingD at cardiff.ac.uk
More information about the Bioconductor