[BioC] Questions related to processing large dataset in batches

Li, Aiguo (NIH/NCI) liai at mail.nih.gov
Wed Jun 30 21:33:25 CEST 2004


Hello, everyone.
 
Our project is a beta tester of Affy HGU133 plus 2 chips, which contain
56,000 probes, and the .cel file size in text format is about 32 MB.  We
currently have more than 100 chips for data process.  I tried to read in the
.cel files into my machine (1Gb RAM) and it can only read in 19 chips.  I
have been communicating with several R experts in our mailing list and some
of them suggest me to split the data in batches during the probe level data
analysis and combine at the probeset level using R cbine/merge function.  I
think that this probably is the best option for me because I have concerns
on data handling capabilities even though I can upgrade my RAM to 4Gb.
However, my second concern is whether the solution of batch analysis will
have any effects on the final data analysis results.  To my opinion,
normalization across chips should be done at once across all chips.  Can I
have probe level normalization during the batch analysis and have an
additional normalization at the probeset level across all chips after the
data combination using R/bioconductor?
 
Thanks in advance,
 
Aiguo Lee

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list