[Rd] Revert to R 3.2.x code of logicalSubscript in subscript.c?

Tue Oct 3 02:00:25 CEST 2017

Suharto,

If you're interested in performance with subscripting, you might want
to look at pqR (pqR-project.org).  It has some substantial performance
improvements for subscripting over R Core versions.  This is
especially true for the current development version of pqR (probably
leading to a new release in about a month).

You can look at a somewhat-stable snapshot of recent pqR development at

  https://github.com/radfordneal/pqR/tree/05e32fa6

In particular, src/main/subscript.c might be of interest.

Note that you should read mods-dir/README if you want to build this,
and in particular, you need to run create-configure in the top-level
source directory first.

I modified your tests a bit, including producing versions using both
vectors of length 1e8 like you did (which will not fit in cache) and
vectors of length 1e5 (which will fit in at least the L3 cache).  I
ran tests on an Intel Skylake processor (E3-1270v5 @ 3.6GHz), using
gcc 7.2 with -O3 -march=native -mtune=native.

I got the following results with R-3.4.2 (with R_ENABLE_JIT=0, which
is slightly faster than using the JIT compiler):

R-3.4.2, LARGE VECTORS: 

  > N <- 1e8; R <- 5
  > #N <- 1e5; R <- 1000
  > 
  > x <- numeric(N)
  > i <- rep(FALSE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.296   0.000   0.297 
  > i <- FALSE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.416   0.000   0.418 
  > 
  > x <- numeric(N)
  > i <- rep(TRUE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    1.416   0.352   1.773 
  > i <- TRUE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    1.348   0.264   1.613 
  > 
  > x <- numeric(N)
  > system.time(for (r in 1:R) a <- x[-1])
     user  system elapsed 
    1.516   0.376   1.895 
  > system.time(for (r in 1:R) a <- x[2:length(x)])
     user  system elapsed 
    1.516   0.308   1.827 
  > 
  > v <- 2:length(x)
  > system.time(for (r in 1:R) a <- x[v])
     user  system elapsed 
    1.416   0.268   1.689 

R-3.4.2, SMALL VECTORS: 

  > #N <- 1e8; R <- 5
  > N <- 1e5; R <- 1000
  > 
  > x <- numeric(N)
  > i <- rep(FALSE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.088   0.000   0.089 
  > i <- FALSE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.084   0.000   0.084 
  > 
  > x <- numeric(N)
  > i <- rep(TRUE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.492   0.020   0.515 
  > i <- TRUE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.408   0.008   0.420 
  > 
  > x <- numeric(N)
  > system.time(for (r in 1:R) a <- x[-1])
     user  system elapsed 
    0.508   0.004   0.516 
  > system.time(for (r in 1:R) a <- x[2:length(x)])
     user  system elapsed 
    0.464   0.008   0.473 
  > 
  > v <- 2:length(x)
  > system.time(for (r in 1:R) a <- x[v])
     user  system elapsed 
    0.428   0.000   0.428 

Here are the results with the development version of pqR (uncompressed
pointers, no byte compilation):

pqR (devel), LARGE VECTORS:

  > N <- 1e8; R <- 5
  > #N <- 1e5; R <- 1000
  > 
  > x <- numeric(N)
  > i <- rep(FALSE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.192   0.000   0.193 
  > i <- FALSE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.436   0.000   0.434 
  > 
  > x <- numeric(N)
  > i <- rep(TRUE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.768   0.216   0.988 
  > i <- TRUE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.832   0.272   1.105 
  > 
  > x <- numeric(N)
  > system.time(for (r in 1:R) a <- x[-1])
     user  system elapsed 
    0.280   0.156   0.435 
  > system.time(for (r in 1:R) a <- x[2:length(x)])
     user  system elapsed 
    0.252   0.184   0.436 
  > 
  > v <- 2:length(x)
  > system.time(for (r in 1:R) a <- x[v])
     user  system elapsed 
    0.828   0.168   0.998 

pqR (devel), SMALL VECTORS:

  > #N <- 1e8; R <- 5
  > N <- 1e5; R <- 1000
  > 
  > x <- numeric(N)
  > i <- rep(FALSE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.040   0.000   0.038 
  > i <- FALSE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.084   0.000   0.087 
  > 
  > x <- numeric(N)
  > i <- rep(TRUE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.156   0.036   0.192 
  > i <- TRUE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed 
    0.184   0.012   0.195 
  > 
  > x <- numeric(N)
  > system.time(for (r in 1:R) a <- x[-1])
     user  system elapsed 
    0.060   0.012   0.075 
  > system.time(for (r in 1:R) a <- x[2:length(x)])
     user  system elapsed 
    0.052   0.024   0.075 
  > 
  > v <- 2:length(x)
  > system.time(for (r in 1:R) a <- x[v])
     user  system elapsed 
    0.180   0.004   0.182 

Summarizing elapsed times:

  LARGE VECTORS   T1     T2     T3     T4     T5     T6     T7   

  R-3.4.2:      0.297  0.418  1.773  1.613  1.895  1.827  1.689
  pqR dev:      0.193  0.434  0.988  1.105  0.435  0.436  0.998

  SMALL VECTORS   T1     T2     T3     T4     T5     T6     T7   

  R-3.4.2:      0.089  0.084  0.515  0.420  0.516  0.473  0.428
  pqR dev:      0.038  0.087  0.192  0.195  0.075  0.075  0.182

As one can see, pqR is substantially faster for all except T2 (where
it's about the same).  The very large advantage of pqR on T5 and T6 is
partly because pqR has special code for efficiently handling things
like x[-1] and x[2:length(x)], so I added the x[v] test to see what
performance is like when this special handling isn't invoked.

There's no particular reason pqR's code for these operations couldn't
be adapted for use in the R Core implementaton, though there are
probably a few issues involving large vectors, and the special
handling of x[2:length(x)] would require implementing pqR's internal
"variant result" mechanism.  pqR also has much faster code for some
other subset and subset assignment operations.

   Radford Neal