[Rd] Suggestion: 20% speed up of which() with two-character mod

Charles C. Berry cberry at tajo.ucsd.edu
Fri Jul 11 17:57:56 CEST 2008


On Thu, 10 Jul 2008, Henrik Bengtsson wrote:

> Hi,
>
> by replacing 'll' with 'wh' in the source code for base::which() one
> gets ~20% speed up for *named logical vectors*.


The amount of speedup depends on how sparse the TRUE values are.

When the proportion of TRUEs gets small the speedup is more than twofold 
on my macbook. For high proportions of TRUE, the speedup is more like the 
20% you cite.

HTH,

Chuck

>
> CURRENT CODE:
>
> which <- function(x, arr.ind = FALSE)
> {
>    if(!is.logical(x))
> 	stop("argument to 'which' is not logical")
>    wh <- seq_along(x)[ll <- x & !is.na(x)]
>    m <- length(wh)
>    dl <- dim(x)
>    if (is.null(dl) || !arr.ind) {
>        names(wh) <- names(x)[ll]
>    }
>    ...
>    wh;
> }
>
> SUGGESTED CODE: (Remove 'll' and use 'wh')
>
> which2 <- function(x, arr.ind = FALSE)
> {
>    if(!is.logical(x))
> 	stop("argument to 'which' is not logical")
>    wh <- seq_along(x)[x & !is.na(x)]
>    m <- length(wh)
>    dl <- dim(x)
>    if (is.null(dl) || !arr.ind) {
>        names(wh) <- names(x)[wh]
>    }
>    ...
>    wh;
> }
>
> That's all.
>
> BENCHMARKING:
>
> # To measure both in same environment
> which1 <- base::which;
> environment(which1) <- globalenv();  # Needed?
>
> N <- 1e6;
> set.seed(0xbeef);
> x <- sample(c(TRUE, FALSE), size=N, replace=TRUE);
> names(x) <- seq_along(x);
> B <- 10;
> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
> stopifnot(identical(idxs1, idxs2));
> print(t1/t2);
> # Fair benchmarking
> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
> print(t1/t2);
> ##      user    system   elapsed
> ##   1.283186   1.052632   1.250000
>
> You get similar results if you put for loop outside the system.time()
> call (and sum up the timings).
>
> Cheers
>
> Henrik
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-devel mailing list