[Rd] R 3.0.0 memory use

Tim Hesterberg timhesterberg at gmail.com
Mon Apr 15 00:22:00 CEST 2013


I did some benchmarking of data frame code, and
it appears that R 3.0.0 is far worse than earlier versions of R
in terms of how many large objects it allocates space for,
for data frame operations - creation, subscripting, subscript replacement.
For a data frame with n rows, it makes either 2 or 4 extra copies of
all of:
        8n bytes (e.g. double precision)
        24n bytes
        32n bytes
E.g., for as.data.frame(numeric vector), instead of allocations
totalling ~8n bytes, it allocates 33 times that much.

Here, compare columns 3 and 5
(columns 2 and 4 are with the dataframe package).

# Summary
#                               R-2.14.2        R-2.15.3        R-3.0.0
#                               w/o     with    w/o     with    w/o
#       as.data.frame(y)        3       1       1       1       5;4;4
#       data.frame(y)           7       3       4       2       6;2;2
#       data.frame(y, z)        7 each  3 each  4       2       8;4;4
#       as.data.frame(l)        8       3       5       2       9;4;4
#       data.frame(l)           13      5       8       3       12;4;4
#       d$z <- z                3,2     1,1     3,1     2,1     7;4;4,1
#       d[["z"]] <- z           4,3     1,1     3,1     2,1     7;4;4,1
#       d[, "z"] <- z           6,4,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
#       d["z"] <- z             6,5,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
#       d["z"] <- list(z=z)     6,3,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
#       d["z"] <- Z #list(z=z)  6,2,2   2,1,1   4,1,2   3,1,1   8;4;4,1,2
#       a <- d["y"]             2       1       2       1       6;4;4
#       a <- d[, "y", drop=F]   2       1       2       1       6;4;4

# Where two numbers are given, they refer to:
#   (copies of the old data frame),
#   (copies of the new column)
# A third number refers to numbers of
#   (copies made of an integer vector of row names)

# For R 3.0.0, I'm getting astounding results - many more copies,
# and also some copies of larger objects; in addition to the data
# vectors of size 80K and 160K, also 240K and 320K.
# Where three numbers are given in form a;c;d, they refer to
#   (copies of 80K; 240K; 320K)

The benchmarks are at
http://www.timhesterberg.net/r-packages/memory.R

I'm using versions of R I installed from source on a Linux box, using e.g.
./configure --prefix=(my path) --enable-memory-profiling --with-readline=no
make
make install



More information about the R-devel mailing list