[BioC] large amount of slides

Marcus Davy MDavy at hortresearch.co.nz
Tue Jun 8 23:38:36 CEST 2004


Hi,
you can use the function object.size to estimate the the storage of any
expression set objects.
e.g.
> object.size(affybatch.example)
[1] 243384
> dim(exprs(affybatch.example))
[1] 10000     3
> object.size(exprs(affybatch.example))
[1] 240280
> object.size(exprs(affybatch.example)) /
(nrow(exprs(affybatch.example))*ncol(exprs(affybatch.example)))
[1] 8.009333

Each matrix double precision value should take 8 bytes of storage, so
you can estimate 
the amount of memory required for n genes by 200 arrays plus annotation
information etc.
On a *standard* windows XP (or 2000) machine running R 1.9.0 you can
increase the 
addressable memory space with the --max-mem-size=2G arguement when you
run the 
executable, details are in the windows FAQ. Check it has increased
with;
>memory.limit()
[1] 2147483648

Memory intensive algorithms could start running out of addressable
memory on some 32-bit 
architectures for large datasets, e.g. Bioconductors siggenes sam
permutation testing function
with B=1000, on 27000 genes is likely to have problems on some 32-bit
platforms depending 
on physical memory and the virtual page size available to the operating
system.


marcus


>>> "Park, Richard" <Richard.Park at joslin.harvard.edu> 5/06/2004 3:40:42
AM >>>
Hi Vada, 
I would caution you on doing rma on that many datasets. I have noticed
a trend in rma, that things get even more underestimated as the number
and variance of the data increases. I have been doing an analysis on
immune cell types for about 100 cel files. My computer (windows 2000,
2gb of ram, 2.6 pentium 4) gives out around 70 datasets, I am pretty
sure that my problem is that windows 2000 has a maximum allocation of
1gb. 

But if most of your data is pretty related (i.e. same tissues, just a
ko vs wt) you should be fine w/ rma. I would caution against using rma
on data that is very different. 

hth, 
richard 

-----Original Message-----
From: Vada Wilcox [mailto:v_wilcox at hotmail.com] 
Sent: Friday, June 04, 2004 11:06 AM
To: bioconductor at stat.math.ethz.ch 
Subject: [BioC] large amount of slides


Dear all,

I have been using RMA succesfully for a while now, but in the past I
have 
only used it on a small amount of slides. I would like to do my study
on a 
larger scale now, with data (series of experiments) from other
researchers 
as well. My questions is the following: if I want to study, let's say
200 
slides, do I have to read them all into R at once (so together I mean,
with 
read.affy() in package affy), or is it OK to read them series by series
(so 
all wild types and controls of one researcher at a time)?

If it is really necessary to read all of them in at one time how much
RAM 
would I need (for let's say 200 CELfiles) and how can I raise the RAM?
I now 
it's possible to raise it by using 'max vsize = ...' but I haven't been
able 
to do it succesfully for 200 experiments though. Can somebody help me
on 
this?

Many thanks in advance,

Vada

_________________________________________________________________

http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ 

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor 

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

______________________________________________________

The contents of this e-mail are privileged and/or confidenti...{{dropped}}



More information about the Bioconductor mailing list