[Rd] Possibility for memory improvement: x <- as.vector(x) always(?) duplicates

Henrik Bengtsson hb at biostat.ucsf.edu
Tue Nov 30 07:40:22 CET 2010


FYI,

from the recent R devel NEWS file:

as.vector() and as.double() etc duplicate less when they leave the
mode unchanged but remove attributes.
as.vector(mode = "any") no longer duplicates when it does not remove
attributes. This helps memory usage in matrix() and array().

This improvement will cut down the memory allocation/garbage
collection for most of us (I would like to add "a lot").  For
instance, no more duplicated copies with the most common matrix() use
cases.

% Rerm
R version 2.13.0 Under development (unstable) (2010-11-26 r53672)
[...]
>  x <- 1:10;
> x <- as.vector(x);
> tracemem(x);
[1] "<0x00000000038e1538"
> x <- as.vector(x);
> x <- as.vector(x);
> x <- as.vector(x);

and so on.

> x <- 1:10;
> tracemem(x);
[1] "<0x00000000038e1590"
> x <- matrix(x, nrow=5, ncol=2);
> x <- as.matrix(x);
> x <- as.matrix(x);
> x <- as.matrix(x);

and so on.  Compare this with what I reported on in my previous message (below).

Browsing the SVN logs for R devel I see that Brian Ripley is the one
who has done all the great work related to this one.

Thank you!

/Henrik

On Tue, Nov 23, 2010 at 6:12 AM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:
> Hi,
>
> I've noticed that as.vector() always allocates a new object, e.g.
>
>> x <- 1:10;
>> x <- as.vector(x);
>> tracemem(x);
> [1] "<0x0000000005622db8"
>> x <- as.vector(x);
> tracemem[0x0000000005622db8 -> 0x0000000005622ec0]: as.vector
>> x <- as.vector(x);
> tracemem[0x0000000005622ec0 -> 0x0000000005622f18]: as.vector
>> x <- as.vector(x);
> tracemem[0x0000000005622f18 -> 0x000000000561c388]: as.vector
>> x <- as.vector(x);
> tracemem[0x000000000561c388 -> 0x000000000561c3e0]: as.vector
>>
>
> and so on.
>
> This also seems to be the reason for an extra copy being created in
> turning a vector into a matrix or an array, e.g.
>
>> x <- 1:10;
>> tracemem(x);
> [1] "<0x000000000561c750"
>> x <- matrix(x, nrow=5, ncol=2);
> tracemem[0x000000000561c750 -> 0x000000000561c7a8]: as.vector matrix
>>
>
> Example of how it could work (not sure if the test with is.vector() is enough):
>
> as.vector <- function(x, mode="any") {
>  if (is.vector(x)) return(x);
>  base::as.vector(x);
> } # as.vector()
>
> matrix <- base::matrix;
> environment(matrix) <- globalenv();
>
>> x <- 1:10;
>> tracemem(x);
> [1] "<0x0000000003965488"
>> x <- matrix(x, nrow=5, ncol=2);
>>
>
> Could this be generic improvement?  Some years ago there similar
> improvements where done for as.integer(), as.numeric() etc.
>
> This is on R v2.12.0 patched (2010-11-09 r53543) and R v2.13.0 devel
> (2010-11-20 r53645) on Windows 7 Ultimate.
>
> /Henrik
>



More information about the R-devel mailing list