[Rd] internal copying in R (soon to be released R-3.1.0

Simon Urbanek simon.urbanek at r-project.org
Mon Mar 3 19:37:34 CET 2014


On Mar 2, 2014, at 12:37 PM, Jens Oehlschlägel <jens.oehlschlaegel at truecluster.com> wrote:

> Dear core group,
> 
> Which operation in R guarantees to get a true copy of an atomic vector, not just a second symbol pointing to the same shared memory?
> 

None, there is no concept of "shared" memory at R level. You seem to be mixing C level API specifics and the R language. In the former duplicate() creates a new copy.


> y <- x[]
> #?
> 
> y <- x
> y[1] <- y[1]
> #?
> 
> Is there any function that returns its argument as a non-shared atomic but only copies if the argument was shared?
> 
> Given an atomic vector x, what is the best official way to find out whether other symbols share the vector RAM? Querying NAMED() < 2 doesn't work because .Call sets sxpinfo_struct.named to 2. It even sets it to 2 if the argument to .Call was a never-named expression!?
> 
> > named(1:3)
> [1] 2
> 

Assuming that you are talking about the C API, please consider reading about the concepts involved. .Call() doesn't set named to 2 at all - it passes whatever object is passed so it is the C code's responsibility to handle incoming objects according to the desired semantics (see the previous post here). 


> And it seems to set it permanently, pure read-access can trigger copy-on-modify:
> 
> > x <- integer(1e8)
> > system.time(x[1]<-1L)
>       User      System verstrichen
>          0           0           0
> > system.time(x[1]<-2L)
>       User      System verstrichen
>          0           0           0
> 
> having called .Call now leads to an unnecessary copy on the next assignment
> 
> > named(x)
> [1] 2
> > system.time(x[1]<-3L)
>       User      System verstrichen
>       0.14        0.07        0.20
> > system.time(x[1]<-4L)
>       User      System verstrichen
>          0           0           0
> 
> this not only happens with user written functions doing read-access
> 
> > is.unsorted(x)
> [1] TRUE
> > system.time(x[1]<-5L)
>       User      System verstrichen
>       0.11        0.09        0.21
> 
> Why don't you simply give package authors read-access to sxpinfo_struct.named in .Call (without setting it to 2)? That would give us more control and also save some unnecessary copying.

Again, you're barking up the wrong tree - .Call() doesn't bump NAMED at all - it simply passes the object:

#include <Rinternals.h>
SEXP nam(SEXP x) { return ScalarInteger(NAMED(x)); }

> .Call("nam", 1+1)
[1] 0
> x=1+1
> .Call("nam", x)
[1] 1
> y=x
> .Call("nam", x)
[1] 2

Cheers,
Simon




> I guess once R switches to reference-counting preventive increasing in .Call could not be continued anyhow.
> 
> Kind regards
> 
> 
> Jens Oehlschlägel
> 
> P.S. please cc me in answers as I am not member of r-devel
> 
> 
> P.P.S. function named() was tentatively defined as follows:
> 
> named <- function(x)
>  .Call("R_bit_named", x, PACKAGE="bit")
> 
> SEXP R_bit_named(SEXP x){
>  SEXP ret_;
>  PROTECT( ret_ = allocVector(INTSXP,1) );
>  INTEGER(ret_)[0] = NAMED(x);
>  UNPROTECT(1);
>  return ret_;
> }
> 
> 
> > version
>               _
> platform       x86_64-w64-mingw32
> arch           x86_64
> os             mingw32
> system         x86_64, mingw32
> status         Under development (unstable)
> major          3
> minor          1.0
> year           2014
> month          02
> day            28
> svn rev        65091
> language       R
> version.string R Under development (unstable) (2014-02-28 r65091)
> nickname       Unsuffered Consequences
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list