[BioC] Questions related to processing large dataset in batches

James MacDonald jmacdon at med.umich.edu
Thu Jul 1 17:38:34 CEST 2004

If you are only planning on doing rma or gcrma, you might take a look at
justRMA and justGCRMA, which use much less memory. I was just informed
off-list that you can do 116 of the hgu133plus2 chips with 1 Gb RAM
using justRMA.



James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

>>> "Li, Aiguo (NIH/NCI)" <liai at mail.nih.gov> 06/30/04 03:33PM >>>
Hello, everyone.
Our project is a beta tester of Affy HGU133 plus 2 chips, which
56,000 probes, and the .cel file size in text format is about 32 MB. 
currently have more than 100 chips for data process.  I tried to read
in the
.cel files into my machine (1Gb RAM) and it can only read in 19 chips. 
have been communicating with several R experts in our mailing list and
of them suggest me to split the data in batches during the probe level
analysis and combine at the probeset level using R cbine/merge
function.  I
think that this probably is the best option for me because I have
on data handling capabilities even though I can upgrade my RAM to 4Gb.
However, my second concern is whether the solution of batch analysis
have any effects on the final data analysis results.  To my opinion,
normalization across chips should be done at once across all chips. 
Can I
have probe level normalization during the batch analysis and have an
additional normalization at the probeset level across all chips after
data combination using R/bioconductor?
Thanks in advance,
Aiguo Lee

	[[alternative HTML version deleted]]

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 

More information about the Bioconductor mailing list