[Rd] Suggestion: 20% speed up of which() with two-character mod

Henrik Bengtsson hb at stat.berkeley.edu
Tue Aug 5 06:14:12 CEST 2008


Hi,

I just want to do a follow up this very simple
fix/correction/speedup/cleanup of the base::which() function.  Here is
a diff:

diff src/library/base/R/which.R which.R
21c21
<     wh <- seq_along(x)[ll <- x & !is.na(x)]
---
>     wh <- seq_along(x)[x & !is.na(x)]
25c25
<         names(wh) <- names(x)[ll]
---
>         names(wh) <- names(x)[wh]

FYI, the 'll' variable is not used elsewhere.  I've been going through
this modifications several times and I cannot see any side effects.

Could someone of R core please commit this?

BTW, when one report diff:s, do you prefer to get it with or without
context information, e.g. -C 3?

/Henrik

On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:
> On Thu, 10 Jul 2008, Henrik Bengtsson wrote:
>
>> Hi,
>>
>> by replacing 'll' with 'wh' in the source code for base::which() one
>> gets ~20% speed up for *named logical vectors*.
>
>
> The amount of speedup depends on how sparse the TRUE values are.
>
> When the proportion of TRUEs gets small the speedup is more than twofold on
> my macbook. For high proportions of TRUE, the speedup is more like the 20%
> you cite.
>
> HTH,
>
> Chuck
>
>>
>> CURRENT CODE:
>>
>> which <- function(x, arr.ind = FALSE)
>> {
>>   if(!is.logical(x))
>>        stop("argument to 'which' is not logical")
>>   wh <- seq_along(x)[ll <- x & !is.na(x)]
>>   m <- length(wh)
>>   dl <- dim(x)
>>   if (is.null(dl) || !arr.ind) {
>>       names(wh) <- names(x)[ll]
>>   }
>>   ...
>>   wh;
>> }
>>
>> SUGGESTED CODE: (Remove 'll' and use 'wh')
>>
>> which2 <- function(x, arr.ind = FALSE)
>> {
>>   if(!is.logical(x))
>>        stop("argument to 'which' is not logical")
>>   wh <- seq_along(x)[x & !is.na(x)]
>>   m <- length(wh)
>>   dl <- dim(x)
>>   if (is.null(dl) || !arr.ind) {
>>       names(wh) <- names(x)[wh]
>>   }
>>   ...
>>   wh;
>> }
>>
>> That's all.
>>
>> BENCHMARKING:
>>
>> # To measure both in same environment
>> which1 <- base::which;
>> environment(which1) <- globalenv();  # Needed?
>>
>> N <- 1e6;
>> set.seed(0xbeef);
>> x <- sample(c(TRUE, FALSE), size=N, replace=TRUE);
>> names(x) <- seq_along(x);
>> B <- 10;
>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
>> stopifnot(identical(idxs1, idxs2));
>> print(t1/t2);
>> # Fair benchmarking
>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
>> print(t1/t2);
>> ##      user    system   elapsed
>> ##   1.283186   1.052632   1.250000
>>
>> You get similar results if you put for loop outside the system.time()
>> call (and sum up the timings).
>>
>> Cheers
>>
>> Henrik
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> Charles C. Berry                            (858) 534-2098
>                                            Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu               UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
>
>



More information about the R-devel mailing list