[R] Why is vector assignment in R recreates the entire vector ?

Wed Sep 1 18:19:34 CEST 2010

Tal, 

For your first example, x is not duplicated in memory. If you compile R
with --enable-memory-profiling, you have access to the tracemem()
function, which will report whether x is duplicate()d:

> x <- rep(1,100)
> tracemem(x)
[1] "<0x8f71c38>"
> x[10] <- NA

This does not result in duplication of x, nor does assignment of x to y:

> y <- x

At this point, y internally references x. It's not until we modify y,
that x is duplicated, and y gets its own copy of the data:

> y[10] <- NA
tracemem[0x8f71c38 -> 0x91fff70]:

Likewise, no duplication occurs using `[<-`:

> x <- rep(1,100)
> tracemem(x)
[1] "<0x8e44900>"
> x <- `[<-`(x, list=10, values=NA)

But, R is not yet smart enough to avoid a duplication here:

> x <- rep(1,100)
> tracemem(x)
[1] "<0x915d580>"
> x <- replace(x, list=10, values=NA)
tracemem[0x915d580 -> 0x915e090]: replace 

Beyond these simple tests, it's difficult to know when R copies memory.
I mentioned in another post recently that subsetting a vector will copy
memory, but this is not reported by tracemem(). For example:

> tracemem(x)
[1] "<0x915ed50>"
> y <- x[1:100]
> tracemem(y)
[1] "<0x915f3f0>"
> identical(x,y)
[1] TRUE

Fortunately, memory is fairly cheap, and memory operations are pretty
fast in modern operating systems, like GNU Linux. I mostly find that the
rate limiting steps in my code are computational routines, like exp().

-Matt

On Wed, 2010-09-01 at 11:09 -0400, Tal Galili wrote:
> Hello all,
> 
> A friend recently brought to my attention that vector assignment actually
> recreates the entire vector on which the assignment is performed.
> 
> So for example, the code:
> x[10]<- NA # The original call (short version)
> 
> Is really doing this:
> x<- replace(x, list=10, values=NA) # The original call (long version)
> # assigning a whole new vector to x
> 
> Which is actually doing this:
> x<- `[<-`(x, list=10, values=NA) # The actual call
> 
> 
> Assuming this can be explained reasonably to the lay man, my question is,
> why is it done this way ?
> Why won't it just change the relevant pointer in memory?
> 
> On small vectors it makes no difference.
> But on big vectors this might be (so I suspect) costly (in terms of time).
> 
> 
> I'm curious for your responses on the subject.
> 
> Best,
> Tal
> 
> 
> 
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina