[Rd] Suggestion: 20% speed up of which() with two-character mod

Henrik Bengtsson hb at stat.berkeley.edu
Fri Jul 11 05:18:30 CEST 2008


Hi,

by replacing 'll' with 'wh' in the source code for base::which() one
gets ~20% speed up for *named logical vectors*.

CURRENT CODE:

which <- function(x, arr.ind = FALSE)
{
    if(!is.logical(x))
	stop("argument to 'which' is not logical")
    wh <- seq_along(x)[ll <- x & !is.na(x)]
    m <- length(wh)
    dl <- dim(x)
    if (is.null(dl) || !arr.ind) {
        names(wh) <- names(x)[ll]
    }
    ...
    wh;
}

SUGGESTED CODE: (Remove 'll' and use 'wh')

which2 <- function(x, arr.ind = FALSE)
{
    if(!is.logical(x))
	stop("argument to 'which' is not logical")
    wh <- seq_along(x)[x & !is.na(x)]
    m <- length(wh)
    dl <- dim(x)
    if (is.null(dl) || !arr.ind) {
        names(wh) <- names(x)[wh]
    }
    ...
    wh;
}

That's all.

BENCHMARKING:

# To measure both in same environment
which1 <- base::which;
environment(which1) <- globalenv();  # Needed?

N <- 1e6;
set.seed(0xbeef);
x <- sample(c(TRUE, FALSE), size=N, replace=TRUE);
names(x) <- seq_along(x);
B <- 10;
t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
stopifnot(identical(idxs1, idxs2));
print(t1/t2);
# Fair benchmarking
t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
print(t1/t2);
##      user    system   elapsed
##   1.283186   1.052632   1.250000

You get similar results if you put for loop outside the system.time()
call (and sum up the timings).

Cheers

Henrik



More information about the R-devel mailing list