[BioC] affy & problems with not enough avaliable memory

Henrik Bengtsson hb at maths.lth.se
Tue Jan 17 05:07:05 CET 2006


Kort, Eric wrote:
> A few comments, in no particular order:
>
> 1. I follow the recommendations of Choe et al in their article "Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset", EXCEPT that I use mas5 instead of RMA specifically because this eliminates the need to normalize all arrays at once.

FYI: (Exact) quantile normalization can be done so that it only
requires *two* arrays to be kept in memory at the same time.  This can
be done using a two-pass approach, which I believe is what Ben Bolstad
have in RMAexpress and others suggest too.  It is just a matter of
time before it is available in R too.  With R packages such as
'affxparser', reading
probe signal from CEL files takes fractions of a second, so there is
no need to store all probe-level data in memory just because we think
it is much faster; it's not.

Cheers

Henrik

> 2. I do my normalization on a 64 bit Linux server so I can allocate > 4GB of memory to the task (4GB being the absolute ceiling of addressable memory using 32bit addresses).  As mentioned in another thread, this is a little pricey, although I would say the 64GB RAM recommendation of that thread is excessive (I can do 200 hgu133plus2 chips with 16GB of RAM without much difficulty).  Then again, more is more where RAM is concerned.
>
> 3. However, if I didn't have a 64 bit machine at my institution, there are various 64 bit machines at institutions near me that I could have used for this sort of periodic normalization.  It seems to me that there must be such machines available to many/most/nearly all of us just waiting to be identified.
>
> 4. Finally, in concert with Dr. Gentleman's data centralization efforts which he described in an earlier post (see http://www.sgdi.org), it occurs to me that it would be incredibly useful if there was some sort of shared computing resources available to the BioC community.  I know our 64bit machine is underutilized 95% of the time, and I imagine this is the case for many of these systems.  I envision a 64bit version of "Folding At Home"..."Normalizing at Home"?  Or are these just fantastical musings of an idle dreamer?
>
> -Eric
>
>
>
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch on behalf of Seth Falcon
> Sent: Mon 1/16/2006 7:16 PM
> To: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] affy & problems with not enough avaliable memory
>
> Hi Nick,
>
> I'm having some deja vu:
> https://stat.ethz.ch/pipermail/bioconductor/2006-January/011471.html
>
>
> On 15 Jan 2006, nicholas-ettinger at uiowa.edu wrote:
>
>>I have 16 Affy Hg-U133-plus-2 CEL files that I am trying to analyze.
>>I am having difficulties generating expression data because of
>>memory problems.  I am working on a Dell desktop running WinXP.
>
>
> How much RAM is installed on your system?  As Jim previously
> commented, purchasing more RAM may be the easiest way to get going.
>
>
>>What can I do?  Do I have to normalize all 16 arrays at once?  Is it
>>still valid to do them piecemeal?  My gut says no.
>
>
> The currently available procedures are intended to normalize a set of
> arrays all at once.  Trust your gut.
>
> We have in the works a normalization routine that will work in limited
> memory environments, but it isn't available yet.
>
> + seth
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> This email message, including any attachments, is for the so...{{dropped}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list