[Rd] Speed improvement for Find() and Position()

Olaf Mersmann olafm at statistik.tu-dortmund.de
Wed Sep 1 14:06:37 CEST 2010


Dear R-developers,

both Find() and Position() (as the documentation mentions) are currently not optimized in any way. I have rewritten both functions in a more efficient manner by replacing the sapply() with a for() loop that terminates early if a match is found. Here is a patch against the current subversion HEAD

  http://www.statistik.tu-dortmund.de/~olafm/temp/fp.patch

and here are some numbers to show that this change is worth while:

% cat fp_bench.R 
set.seed(42)
pred <- function(z) z == 1

for (n in c(10^(2:4))) {
  x <- sample(1:n, 2*n, replace=TRUE)
  
  tf <- system.time(replicate(1000L, Find(pred, x)))
  message(sprintf("Find    : n=%5i user=%6.3f system=%6.3f",
                  2*n, tf[1], tf[2]))

  tp <- system.time(replicate(1000L, Find(pred, x)))
  message(sprintf("Position: n=%5i user=%6.3f system=%6.3f",
                  2*n, tp[1], tp[2]))
}

## Unpatched R:
% Rscript fp_bench.R 
Find    : n=  200 user= 0.491 system= 0.015
Position: n=  200 user= 0.477 system= 0.014
Find    : n= 2000 user= 4.450 system= 0.083
Position: n= 2000 user= 4.507 system= 0.094
Find    : n=20000 user=63.435 system= 1.497
Position: n=20000 user=63.130 system= 1.328

## Patched R:
% ./bin/Rscript fp_bench.R
Find    : n=  200 user= 0.101 system= 0.013
Position: n=  200 user= 0.085 system= 0.003
Find    : n= 2000 user= 0.781 system= 0.002
Position: n= 2000 user= 0.809 system= 0.012
Find    : n=20000 user=20.537 system= 0.394
Position: n=20000 user=20.502 system= 0.404

Cheers,
Olaf Mersmann


More information about the R-devel mailing list