[BioC] large amount of slides

Marcus Davy MDavy at hortresearch.co.nz
Tue Jun 8 23:38:36 CEST 2004

you can use the function object.size to estimate the the storage of any
expression set objects.
> object.size(affybatch.example)
[1] 243384
> dim(exprs(affybatch.example))
[1] 10000     3
> object.size(exprs(affybatch.example))
[1] 240280
> object.size(exprs(affybatch.example)) /
[1] 8.009333

Each matrix double precision value should take 8 bytes of storage, so
you can estimate 
the amount of memory required for n genes by 200 arrays plus annotation
information etc.
On a *standard* windows XP (or 2000) machine running R 1.9.0 you can
increase the 
addressable memory space with the --max-mem-size=2G arguement when you
run the 
executable, details are in the windows FAQ. Check it has increased
[1] 2147483648

Memory intensive algorithms could start running out of addressable
memory on some 32-bit 
architectures for large datasets, e.g. Bioconductors siggenes sam
permutation testing function
with B=1000, on 27000 genes is likely to have problems on some 32-bit
platforms depending 
on physical memory and the virtual page size available to the operating


>>> "Park, Richard" <Richard.Park at joslin.harvard.edu> 5/06/2004 3:40:42
AM >>>
Hi Vada, 
I would caution you on doing rma on that many datasets. I have noticed
a trend in rma, that things get even more underestimated as the number
and variance of the data increases. I have been doing an analysis on
immune cell types for about 100 cel files. My computer (windows 2000,
2gb of ram, 2.6 pentium 4) gives out around 70 datasets, I am pretty
sure that my problem is that windows 2000 has a maximum allocation of

But if most of your data is pretty related (i.e. same tissues, just a
ko vs wt) you should be fine w/ rma. I would caution against using rma
on data that is very different. 


-----Original Message-----
From: Vada Wilcox [mailto:v_wilcox at hotmail.com] 
Sent: Friday, June 04, 2004 11:06 AM
To: bioconductor at stat.math.ethz.ch 
Subject: [BioC] large amount of slides

Dear all,

I have been using RMA succesfully for a while now, but in the past I
only used it on a small amount of slides. I would like to do my study
on a 
larger scale now, with data (series of experiments) from other
as well. My questions is the following: if I want to study, let's say
slides, do I have to read them all into R at once (so together I mean,
read.affy() in package affy), or is it OK to read them series by series
all wild types and controls of one researcher at a time)?

If it is really necessary to read all of them in at one time how much
would I need (for let's say 200 CELfiles) and how can I raise the RAM?
I now 
it's possible to raise it by using 'max vsize = ...' but I haven't been
to do it succesfully for 200 experiments though. Can somebody help me

Many thanks in advance,




Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 


The contents of this e-mail are privileged and/or confidenti...{{dropped}}

More information about the Bioconductor mailing list