[R] Error: cannot allocate vector of size...

Peter Dalgaard p.dalgaard at biostat.ku.dk
Tue Nov 10 15:19:31 CET 2009


maiya wrote:
> OK, it's the simple math that's confusing me :)
> 
> So you're saying 2.4GB, while windows sees the data as 700KB. Why is that
> different?

700_MB_, I assume!

In a nutshell, a single column and a spacer takes 2 bytes per subject, 
but a floating point variable takes 8, and R is not good at detecting 
things that can be compressed. At best, you can ensure that variables 
are read as integers or factors, bringing the storage requirements down 
to four bytes.

The "ff" package might be something for you. The most strongly 
compressed items have trouble with storing NA values, though.

-p


> And lets say I could potentially live with e.g. 1/3 of the cases - that
> would make it .8GB, which should be fine? But then my question is if there
> is any way to sample the rows in read.table? Or what would be the best way
> of importing a random third of my cases?
> 
> Thanks!
> 
> M.
> 
> 
> 
> jholtman wrote:
>> A little simple math.  You have 3M rows with 100 items on each row.
>> If read in this would be 300M items.  If numeric, 8 bytes/item, this
>> is 2.4GB.  Given that you are probably using a 32 bit version of R,
>> you are probably out of luck.  A rule of thumb is that your largest
>> object should consume at most 25% of your memory since you will
>> probably be making copies as part of your processing.
>>
>> Given that, is you want to read in 100 variables at a time, I would
>> say your limit would be about 500K rows to be reasonable.  So you have
>> a choice; read in fewer rolls, read in all 3M rows but at 20 columns
>> per read, put the data in a database and extract what you need.
>> Unless you go to a 64-bit version of R you will probably not be able
>> to have the whole file in memory at one time.
>>
>> On Tue, Nov 10, 2009 at 7:10 AM, maiya <maja.zaloznik at gmail.com> wrote:
>>> I'm trying to import a table into R the file is about 700MB. Here's my
>>> first
>>> try:
>>>
>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
>>> Error: cannot allocate vector of size 15.6 Mb
>>> In addition: Warning messages:
>>> 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>  :
>>>  Reached total allocation of 1535Mb: see help(memory.size)
>>> 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>  :
>>>  Reached total allocation of 1535Mb: see help(memory.size)
>>> 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>  :
>>>  Reached total allocation of 1535Mb: see help(memory.size)
>>> 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>  :
>>>  Reached total allocation of 1535Mb: see help(memory.size)
>>>
>>> Then I tried
>>>
>>>> memory.limit(size=4095)
>>>  and got
>>>
>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
>>> Error: cannot allocate vector of size 11.3 Mb
>>>
>>> but no additional errors. Then optimistically to clear up the workspace:
>>>
>>>> rm()
>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
>>> Error: cannot allocate vector of size 15.6 Mb
>>>
>>> Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb?
>>> I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable
>>> memory is usually 2Gb. Surely they mean GB?
>>>
>>> The file I'm importing has about 3 million cases with 100 variables that
>>> I
>>> want to crosstabulate each with each. Is this completely unrealistic?
>>>
>>> Thanks!
>>>
>>> Maja
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> -- 
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> 


-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907




More information about the R-help mailing list