[Rd] Best practices for writing R functions (really copying)

Matt Shotwell matt at biostatmatt.com
Mon Jul 25 18:44:56 CEST 2011


Also consider subsetting:

cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } ))
cat("h: "); print(system.time( { sum(A[1:50000,1:1000]) } ))
cat("i: "); print(system.time( { sum(A[]) } ))
cat("j: "); print(system.time( { sum(A) } ))

In contrast with Python's NumPy array, the R array type has no concept
of 'viewing' the array contents in different ways. Instead, the contents
are copied or adjusted. Subsetting and matrix transposition are examples
of transformations that might be considered alternate 'views' of an
array. This is especially painful in the example above, because
A[1:5000,1:1000], A[], and A evaluate to identical() arrays. In case h:
the array is copied element-wise. In i: A is duplicate()d. In case j: A
is not copied.

Matt

On Mon, 2011-07-25 at 11:53 -0400, Radford Neal wrote:
> Gabriel Becker writes:
> 
>   AFAIK R does not automatically copy function arguments. R actually tries
>   very hard to avoid copying while maintaining "pass by value" functionality.
> 
>   ... R only copies data when you modify an object, not
>   when you simply pass it to a function.
> 
> This is a bit misleading.  R tries to avoid copying by maintaining a
> count of how many references there are to an object, so that x[i] <- 9
> can be done without a copy if x is the only reference to the vector.
> However, it never decrements such counts.  As a result, simply passing
> x to a function that accesses but does not change it will result in x
> being copied if x[i] is changed after that function returns.  An
> exception is that this usually isn't the case if x is passed to a
> primitive function.  But note that not all standard functions are 
> technically "primitive".
> 
> The end result is that it's rather difficult to tell when copying will
> be done.  Try the following test, for example:
> 
>   cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } ))
>   cat("b: "); print(system.time( { A[1,1]<-7; 0 } ))
>   cat("c: "); print(system.time( { B <- sqrt(A); 0 } ))
>   cat("d: "); print(system.time( { A[1,1]<-7; 0 } ))
>   cat("e: "); print(system.time( { B <- t(A); 0 } ))
>   cat("f: "); print(system.time( { A[1,1]<-7; 0 } ))
>   cat("g: "); print(system.time( { A[1,1]<-7; 0 } ))
> 
> You'll find that the time printed after b:, d:, and g: is near zero,
> but that there is non-negligible time for f:.  This is because sqrt
> is primitive but t is not, so the modification to A after the call
> t(A) requires that a copy be made.
> 
>    Radford Neal
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list