[R] Memory usage and limit

Paul Roebuck roebuck at mdanderson.org
Thu Apr 27 07:32:44 CEST 2006


On Wed, 26 Apr 2006, Min Shao wrote:

> I recently made a 64-bit build of R-2.2.1 under Solaris 9 using gcc v.3.4.2.
> The server has 12GB memory, 6 Sparc CPUs and plenty of swap space. I was the
> only user at the time of the following experiment.
>
> I wanted to benchmark R's capability to read large data files and used a
> data set consisting of 2MM records with 65 variables in each row. All but 2
> of the variables are of the character type and the other two are numeric.
> The whole data set is about 600 MB when stored as plain ASCII file.
>
> The following code was used in the benchmarking runs:
>
>      c = list(var1=0, var2=0, var3="", var4="", .....var65="")
>      A <- scan("test.dat", skip = 1, sep = ",", what = c, nmax=XXXXX,
> quiet=FALSE)
>      summary(A)
> where XXXX = 1000000 or 2000000
>
> I made two runs with nmax=1000000 and nmax=2000000 respectively. The first
> run completed successfully, in about hour of CPU time. However, the actual
> memory usage exceeded 2.2GB, about 7 times of the acutal file size on disk.
> The second run aborted when the memory usage reached 4GB. The error messgae
> is  "vector memory exhausted (limit reached?)".
>
> Three questions:
> 1) Why were so much memory and CPU consumed to read 300MB of data? Since
> almost all of the variables are character, I expected almost of 1-1 mapping
> between file size on disk and that in memory
> 2) Since this is a 64-bit build, I expected it could handle more than the
> 600MB of data I used. What does the error message mean? I don't beleive the
> vector length exceeded the theoretic limit of about 1 billion.
> 3) The original file was compressed and I had to uncompress it before the
> experiement. Is there a way to read compressed files directly in R

A <- scan(gzfile("test.dat.gz", "r"),
          skip = 1,
          sep  = ",",
          what = c,
          nmax = XXXXX,
          quiet= FALSE)

----------------------------------------------------------
SIGSIG -- signature too long (core dumped)




More information about the R-help mailing list