[Rd] imporving performance of slicing on matrices and S4 their derivatives

Sklyar, Oleg (London) osklyar at maninvestments.com
Fri Mar 27 16:56:31 CET 2009


Dear list.

It is a known issue that accessing slots of S4 objects and in particular
accessing .Data slots is slow in R. However, what surprises me are two
things demonstrated in the code below (runnable with 'inline', my times
are in the comments):

- copying data out of a large 3x1e7 .Data slot into a matrix can be
easily made 3-4 times faster than accessing a .Data slot which I believe
grabs a reference (and as copying can be avoided the acceleration should
be even more dramatic). It is surprising that this memory inefficient
operation is faster than such a simple thing like getting a reference!

- getting a column, or columns, from an atomic R matrix or actually an
S4 object derived from it, can be up to 10 times faster than using
standard slicing with the [-operator (yes, less generic, but with such
performance gain we do definitely use it).

My point is: should not [-operators for atomic objects and @.Data be
redesigned? The code here is just an example for double storage-mode and
without any checks though. Adding checks and colnames etc does not lead
to performance degradation.

I was originally thinking that the dispatch looking up a particular [
implementation for an object is the issue, but in fact it is not the
case as redefining [ or $ as S4 methods (!) to use the mcol below for an
S4 object shows the same performance gains as the diret use of use of
mcol/mcols!

Any comments welcome!


## --- code ----------------------------------------------------
## available from CRAN, needs compilers installed
library(inline)

## get 1 column of a matrix to use instead of [-operator 
## (same performance gains if index is a character or on multiple
columns or
## when getting multiple columns as matrix and assigning the names from
input)

body = "/* test for column extraction: no checks here for code
simplicity */
    int nrow = Rf_nrows(m);
    int i = INTEGER(index)[0] - 1;
    SEXP res;
    PROTECT(res = allocVector(REALSXP, nrow));
    memcpy(REAL(res), &(REAL(m)[i*nrow]), nrow*sizeof(double));
    UNPROTECT(1);
    return res;"

mcol = cfunction(signature(m="matrix", index="integer"), body=body, 
    includes="#include <string.h>")
    
## get A COPY of the @.Data slot from an object derived from
numeric/matrix

body = "/* test performance of getting A COPY of @.Data, keeping
dimnames */
    int nrow = Rf_nrows(m);
    int ncol = Rf_ncols(m);
    SEXP res, dim;
    PROTECT(res = allocVector(REALSXP, nrow*ncol));
    PROTECT(dim = allocVector(INTSXP, 2));
    INTEGER(dim)[0] = nrow;
    INTEGER(dim)[1] = ncol;
    SET_DIM(res, dim);
    if (GET_DIMNAMES(m)!= R_NilValue)
        SET_DIMNAMES(res, Rf_duplicate(GET_DIMNAMES(m)));
    if (ncol>0 && nrow>0)
        memcpy(REAL(res), REAL(m), nrow*ncol*sizeof(double));
    UNPROTECT(2);
    return res;"

mcols = cfunction(signature(m="matrix"), body=body, 
    includes="#include <string.h>")

## --- tests ---------------------------------------------------
m = matrix(runif(3e7), nc=3)

setClass("MyClass", representation("matrix", comment="character"))
dat = new("MyClass", m, comment="test object")

mean(sapply(1:20, function(i) system.time(dat at .Data)[1] ))
## output: [1] 0.2526
mean(sapply(1:20, function(i) system.time(mcols(dat))[1] ))
## output: [1] 0.08215

mean(sapply(1:50, function(i) system.time(m[,2])[1] ))
## output: [1] 0.1222
mean(sapply(1:50, function(i) system.time(mcol(m,2L))[1] ))
## output: [1] 0.02596
mean(sapply(1:50, function(i) system.time(dat[,2])[1] ))
## output: [1] 0.1269
mean(sapply(1:50, function(i) system.time(mcol(dat,2L))[1] ))
## output: [1] 0.02584

---
> sessionInfo()
R version 2.9.0 Under development (unstable) (2009-02-02 r47821) 
x86_64-unknown-linux-gnu 

locale:
C

attached base packages:
[1] stats     graphics  utils     datasets  grDevices methods   base


other attached packages:
[1] inline_0.3.3



Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
osklyar at maninvestments.com

**********************************************************************
Please consider the environment before printing this email or its attachments.
The contents of this email are for the named addressees ...{{dropped:19}}



More information about the R-devel mailing list