[R] large data set, error: cannot allocate vector

Jason Barnhart jasoncbarnhart at msn.com
Sat May 6 01:48:35 CEST 2006


Please try memory.limit() to confirm how much system memory is available to 
R.

Additionally, read.delim returns a data.frame.  You could use the colClasses 
argument to change variable types (see example below) or use scan() which 
returns a vector.  This would store the data more compactly.  The vector 
object is significantly smaller than the data.frame.

It appears from your example session that you are examining a single 
variable.  If so, a vector would suffice.

Note in the example below, processing large numbers in the integer type 
creates an under/over flow error.

====================Begin Session====================================
> #create vector
> foovector<-scan(file="temp.txt")
Read 2490368 items
>
> #create data.frame
> foo<-read.delim(file="temp.txt",row.names=NULL,header=FALSE,colClasses=as.vector(c("numeric")))
> attributes(foo)$names<-"myfoo"
>
> foo2<-read.delim(file="temp.txt",row.names=NULL,header=FALSE,colClasses=as.vector(c("integer")))
> attributes(foo2)$names<-"myfoo"
>
> #vector from data.frame
> tmpfoo<-foo$myfoo
>
> #check size
> object.size(foo)
[1] 119538076
> object.size(foo2)
[1] 109576604
> object.size(foovector)
[1] 19922972
> object.size(tmpfoo)
[1] 19922972
>
> #check sums
> sum(tmpfoo)
[1] 2.498528e+13
> sum(foo$myfoo)
[1] 2.498528e+13
> sum(foo2$myfoo)
[1] NA
Warning message:
Integer overflow in sum(.); use sum(as.numeric(.))
> sum(foovector)
[1] 2.498528e+13
>
> #show type
> class(foo2$myfoo)
[1] "integer"
> class(foo$myfoo)
[1] "numeric"
> class(tmpfoo)
[1] "numeric"
> class(foovector)
[1] "numeric"
====================End Session====================================












----- Original Message ----- 
From: "Robert Citek" <rwcitek at alum.calberkeley.org>
To: <r-help at stat.math.ethz.ch>
Sent: Friday, May 05, 2006 3:15 PM
Subject: Re: [R] large data set, error: cannot allocate vector


>
> On May 5, 2006, at 11:30 AM, Thomas Lumley wrote:
>> In addition to Uwe's message it is worth pointing out that gc()
>> reports
>> the maximum memory that your program has used (the rightmost two
>> columns).
>> You will probably see that this is large.
>
> Reloading the 10 MM dataset:
>
> R > foo <- read.delim("dataset.010MM.txt")
>
> R > object.size(foo)
> [1] 440000376
>
> R > gc()
>            used  (Mb) gc trigger  (Mb) max used  (Mb)
> Ncells 10183941 272.0   15023450 401.2 10194267 272.3
> Vcells 20073146 153.2   53554505 408.6 50086180 382.2
>
> Combined, Ncells or Vcells appear to take up about 700 MB of RAM,
> which is about 25% of the 3 GB available under Linux on 32-bit
> architecture.  Also, removing foo seemed to free up "used" memory,
> but didn't change the "max used":
>
> R > rm(foo)
>
> R > gc()
>          used (Mb) gc trigger  (Mb) max used  (Mb)
> Ncells 186694  5.0   12018759 321.0 10194457 272.3
> Vcells  74095  0.6   44173915 337.1 50085563 382.2
>
> Regards,
> - Robert
> http://www.cwelug.org/downloads
> Help others get OpenSource software.  Distribute FLOSS
> for Windows, Linux, *BSD, and MacOS X with BitTorrent
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list