[Rd] allocMatrix error

Martin Morgan mtmorgan at fhcrc.org
Tue Feb 17 14:46:08 CET 2009


Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:

> On Tue, 17 Feb 2009, Hamid Ashafi wrote:
>
>> On Sat, Feb 14, 2009 at 00:17,  <ashrafi at ucdavis.edu> wrote:
>>
>> Hi,
>>
>> I was trying to read ~400 chips in an affybatch and I got the same message.
>> Could you find a remedy for that. My server has 128 GB of RAM. However, R
>> halted ever before it uses the memory.
>
> We don't have anything like sufficient details (please do read the
> posting guide).
>
> If the issue is the size of matrices, you possibly (depending on the
> compiler) could arrange to compile R (and any relevant system
> libraries) to use 64-bit ints.  For C code in R there is typedef to
> change, and you would need integer*8 in the Fortran.  We would be
> interested to know the results if you do so, but the developers are
> unlikely to do so for you.
>
> In any case, since you mention 'affybatch' it looks like this might be
> a design issue in that BioC package and the BioC lists might be the
> appropriate place to discuss it.  It is not obvious to me why ~400
> datasets need a single large R object rather than, say, a list of 400
> smaller ones, if that is indeed the problem.  So, to return to my
> first point:
>
>> We don't have anything like sufficient details.
>
> Please give us the full details of your system, the memory in use (see
> ?gc) and what you were trying to do.
>
>
>> I have been able to load upto 250 CEL files but this time I wanted to test
>> what would happen if I want to normalize 400 chips.
>
> R can handle up to 16GB objects, which even for a 64-bit OS and 128GB
> of RAM are pretty large objects and do not arise naturally from many
> small files.

Hamid -- Prof. Ripley is correct in pointing you toward the
Bioconductor mailing list

  http://bioconductor.org/docs/mailList.html

The usual solution for very large sets of array is to use packages
like aroma.affymetrix or xps that do not put the objects entirely in
memory, or the AffyPara package to divide large jobs into smaller ones
that are processed in parallel. Also of course to think about whether
it is statistically reasonable to normalize across all arrays.  There
are discussions of this topic on the Bioc mailing list, so look in the
archive for additional hints.

Martin

>> Thanks for your prompt response.
>>
>>
>>
>> Hamid
>>
>>>
>>
>>>
>>
>>>
>>
>>> Martin Maechler wrote:
>>
>>>>
>>
>>>>>>>>> "VK" == Vadim Kutsyy <vadim at kutsyy.com>
>>
>>>>>>>>>     on Fri, 01 Aug 2008 07:35:01 -0700 writes:
>>
>>>>
>>
>>>>     VK> Martin Maechler wrote:
>>
>>>>    >>
>>
>>>>     VK> The problem is in array.c, where allocMatrix check for
>>
>>>>     VK> "if ((double)nrow * (double)ncol > INT_MAX)".  But why
>>
>>>>     VK> itn is used and not long int for indexing? (max int is
>>
>>>>     VK> 2147483647, max long int is 9223372036854775807)
>>
>>>>    >>
>>
>>>>    >> Well, Brian gave you all info:
>>
>>>>    >>
>>
>>>>     VK> exactly, and given that most modern system used for
>>
>>>>     VK> computations (i.e.  64bit system) have long int which is
>>
>>>>     VK> much larger than int, I am wondering why long int is not
>>
>>>>     VK> used for indexing (I don't think that 4 bit vs 8 bit
>>
>>>>     VK> storage is an issue).
>>
>>>>    >> Did you really carefully read ?Memory-limits ??
>>
>>>>    >>
>>
>>>>     VK> Yes, it is specify that 4 bit int is used for indexing
>>
>>>>     VK> in all version of R, but why? I think 2147483647
>>
>>>>     VK> elements for a single vector is OK, but not as total
>>
>>>>     VK> number of elements for the matrix.  I am running out of
>>
>>>>     VK> indexing at mere 10% memory consumption.
>>
>>>>
>>
>>>> Hmm, do you have 160 GBytes of RAM?
>>
>>>> But anyway, let's move this topic from R-help to R-devel.
>>
>>>>
>>
>>>>    [...........]
>>
>>>>
>>
>>>>     VK> PS: I have no problem to go and modify C code, but I am
>>
>>>>     VK> just wondering what are the reasons for having such
>>
>>>>     VK> limitation.
>>
>>>>
>>
>>>> This limitation and its possible remedies are an interesting topic,
>>
>>>> but really not for R-help:
>>
>>>>
>>
>>>> It will be a lot about C programming the internal represenation of R
>>
>>>> objects, etc.
>>
>>>> Very fascinating .... but for R-devel.
>>
>>>>
>>
>>>> "See you there!"
>>
>>>> Martin
>>
>>>>
>>
>>>> ______________________________________________
>>
>>>> R-help at r-project.org mailing list
>>
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>
>>>> PLEASE do read the posting guide
>>
>>>> http://www.R-project.org/posting-guide.html
>>
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>>>
>>
>>>>
>>
>>> Quoted from:
>>
>>> http://www.nabble.com/allocMatrix-limits-tp18763791p18776531.html
>>
>>>
>>
>>>
>>
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-devel mailing list