[Rd] Measuring and comparing .C and .Call overhead
Simon Urbanek
simon.urbanek at r-project.org
Tue Jul 19 20:47:46 CEST 2011
On Jul 19, 2011, at 11:07 AM, Alireza Mahani wrote:
> Further pursuing my curiosity to measure the efficiency of R/C++ interface, I
> conducted a simple matrix-vector multiplication test using .C and .Call
> functions in R. In each case, I measured the execution time in R, as well as
> inside the C++ function. Subtracting the two, I came up with a measure of
> overhead associated with each call. I assume that this overhead would be
> non-existent of the entire code was implemented in C++. [Please let me know
> if my approach is somehow faulty.] The results of this simple test have
> significant implications for my software development approach. I hope others
> find this information useful as well. I have attached the code files in the
> bottom. Below I provide a summary output, and interpret the results. [All
> times in table below are in microseconds]
>
> m/n/time_.Call_R/time_.Call_C++/.Call_overhead/time_.C_R/time_.C_C++/.C_overhead
> 100/10/3.23/1.33/*1.90*/16.89/0.96/*15.93*
> 200/15/5.38/3.44/*1.94*/30.77/2.81/*27.96*
> 500/20/12.48/10.4/*2.08*/78.12/9.37/*68.75*
>
Your "overhead" calculations are anything but since they include a loop, value replacement, arithmetics, unnecessary function calls (you don't need as.double() for .Call - it's more efficient to use coerceVector() or throw an error depending on what you desire) - that have nothing to do with .Call.
In addition, if you really cared about overhead, you would avoid symbol lookup millions of times:
foo.c:
#include <Rinternals.h>
SEXP foo() { return R_NilValue; }
> system.time(for(i in 1:1e6) .Call("foo"))
user system elapsed
4.943 0.016 4.960
> foo = getNativeSymbolInfo("foo")$address
> system.time(for(i in 1:1e6) .Call(foo))
user system elapsed
0.363 0.001 0.365
This is very close to just calling a constant closure with the same result:
> f=function() NULL
> system.time(for(i in 1:1e6) f())
user system elapsed
0.238 0.001 0.239
So .Call overhead is essentially negligible - and not what you were measuring ;).
There is only a tiny overhead in forcing the argument promise, but it's constant:
> x=rnorm(1e6)
> system.time(for(i in 1:1e6) .Call(foo,x))
user system elapsed
0.455 0.001 0.456
> x=rnorm(1e9)
> system.time(for(i in 1:1e6) .Call(foo,x))
user system elapsed
0.454 0.000 0.455
and it's again no different that for a closure:
> f=function(x) NULL
> system.time(for(i in 1:1e6) f())
user system elapsed
0.259 0.001 0.259
> system.time(for(i in 1:1e6) f(x))
user system elapsed
0.329 0.001 0.329
so this is inherent to any call. .Call is the fastest you can get, there is nothing faster (other than hacking your code into R internals...).
Cheers,
Simon
> Interpretation:
>
> 1- .Call overhead holds nearly constant, i.e. independent of data size. This
> is expected since .Call works by passing pointers. (How can we explain the
> slight increase in overhead?)
> 2- C++ times for .C are somewhat better than .Call. This is likely to be due
> to the overhead associated with unpacking the SEXP pointers in a .Call
> function.
> 3- The overhead for .C dominates the execution time. For a 500x20 matrix,
> the overhead is ~90% of total time. This means that whenever we need to make
> repeated calls to a C/C++ function from R, and when performance is important
> to us, .Call is much preferred to .C, even at modest data sizes.
> 4- Overhead for .C scales sub-linearly with data size. I imagine that this
> overhead can be reduced through registering the functions and using the
> style field in R_CMethodDef to optimize data transfer (per Section 5.4 of
> "Writing R Extensions"), but perhaps not by more than a half.
> 5- Even using .Call, the overhead is still a significant percentage of total
> time, though the contribution decreases with data size (and compute load of
> function). However, in the context of parallelization benefits, this
> overhead puts an unpleasant cap on achievable gains. For example, even if
> the compute time goes down by 100x, if overhead was even 10% of the time,
> our effective gain will saturate at 10x. In other words, effective
> parallelization can only be achieved by moving big chunks of the code to
> C/C++ so as to avoid repeated calls from R to C.
>
> I look forward to your comments.
>
> -Alireza
> --------------------------------
> #code file: mvMultiply.r
> #measuring execution time for a matrix-vector multiplication (A%*%x) using
> .C and .Call functions
> #execution time is measured both in R and inside the C++ functions;
> subtracting the two is
> #used as an indicator of overhead associated with each call
>
> rm(list = ls())
> system("R CMD SHLIB mvMultiply.cc")
> dyn.load("mvMultiply.so")
>
> m <- 100 #number of rows in matrix A
> n <- 10 #number of columns in matrix A (= number of rows in vector x)
> N <- 1000000
>
> A <- runif(m*n)
> x <- runif(n)
>
> # measuring .Call
> tCall_c <- 0.0
> t1 <- proc.time()[3]
> for (i in 1:N) {
> tCall_c <- tCall_c + .Call("matvecMultiply", as.double(A), as.double(x))
> }
> tCall_R <- proc.time()[3] - t1
> cat(".Call - Time measured in R: ", round(tCall_R,2), "sec\n")
> cat(".Call - Time measured in C++: ", round(tCall_c,2), "sec\n")
> cat(".Call - Implied overhead: ", round(tCall_R,2) - round(tCall_c,2), "sec
> -> per call: ", 1000000*(round(tCall_R,2) - round(tCall_c,2))/N, "usec\n")
>
> # measuring .C
> tC_c <- 0.0
> t1 <- proc.time()[3]
> for (i in 1:N) {
> tC_c <- tC_c + .C("matvecMultiply2", as.double(A), as.double(x),
> double(c(m)), as.integer(c(m)), as.integer(c(n)), t = double(1))$t
> }
> tC_R <- proc.time()[3] - t1
> cat(".C - Time measured in R: ", round(tC_R,2), "sec\n")
> cat(".C - Time measured in C++: ", round(tC_c,2), "sec\n")
> cat(".C - Implied overhead: ", round(tC_R,2) - round(tC_c,2), "sec -> per
> call: ", 1000000*(round(tC_R,2) - round(tC_c,2))/N, "usec\n")
>
> dyn.unload("mvMultiply.so")
> --------------------------------
> #code file: myMultiply.cc
> #include <Rinternals.h>
> #include <R.h>
> #include <sys/time.h>
>
> extern "C" {
>
> SEXP matvecMultiply(SEXP A, SEXP x) {
> timeval time1, time2;
> gettimeofday(&time1, NULL);
> SEXP execTime_sxp;
> PROTECT(execTime_sxp = allocVector(REALSXP, 1));
> double *execTime; execTime = REAL(execTime_sxp);
> double *rA = REAL(A), *rx = REAL(x), *ry;
> int n = length(x);
> int m = length(A) / n;
> SEXP y;
> PROTECT(y = allocVector(REALSXP, m));
> ry = REAL(y);
> for (int i = 0; i < m; i++) {
> ry[i] = 0.0;
> for (int j = 0; j < n; j++) {
> ry[i] += rA[j * m + i] * rx[j];
> }
> }
> UNPROTECT(1);
> gettimeofday(&time2, NULL);
> *execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -
> time1.tv_usec)/1000000.0;
> UNPROTECT(1);
> return execTime_sxp;
> }
>
> void matvecMultiply2(double *A, double *x, double *y, int *m_ptr, int
> *n_ptr, double *execTime) {
> timeval time1, time2;
> gettimeofday(&time1, NULL);
> int m = *m_ptr;
> int n = *n_ptr;
> for (int i = 0; i < m; i++) {
> y[i] = 0.0;
> for (int j = 0; j < n; j++) {
> y[i] += A[j * m + i] * x[j];
> }
> }
> gettimeofday(&time2, NULL);
> *execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -
> time1.tv_usec)/1000000.0;
> }
>
> }
> --------------------------------
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Measuring-and-comparing-C-and-Call-overhead-tp3678361p3678361.html
> Sent from the R devel mailing list archive at Nabble.com.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list