[R] which() vs. just logical selection in df

1/k^c kch@mber|n @end|ng |rom gm@||@com
Thu Oct 15 04:23:37 CEST 2020


Hi Bert,

Thank you very much! I was unaware that .Internal() referred to C code.

I figured out the difference. which() dimensions the object returned
to be only the relevant records first. Logical indexing dimensions
last.

> length(index1<-dat$gender2=="other")
[1] 2000000
> length(index2<-which(index1))
[1] 666667
length(dat[index1,])
[1] 666667
length(dat[index2,])
[1] 666667

microbenchmark(index1<-dat$gender2=="other", times=100L) # 2e6 records, ~ 13ms.
microbenchmark(index2<-which(index1), times=100L) # Extra time for
which() ~ 5ms.
microbenchmark(dat[index1,], times=100L) # Time to return just TRUE
records using the whole 2e6 index. ~99ms
microbenchmark(dat[index2,], times=100L) # Time to return all records
from shorter index ~64ms.

Cheers,
Keith


On Wed, Oct 14, 2020 at 4:42 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> Inline.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Oct 14, 2020 at 3:23 PM 1/k^c <kchamberln using gmail.com> wrote:
>
>> Is which() invoking c-level code by chance, making it slightly faster
>> on average?
>
>
> You do not need to ask such questions. R is open source, so just look!
>
> > which
> function (x, arr.ind = FALSE, useNames = TRUE)
> {
>     wh <- .Internal(which(x))   ## C code
>     if (arr.ind && !is.null(d <- dim(x)))
>         arrayInd(wh, d, dimnames(x), useNames = useNames)
>     else wh
> }
> <bytecode: 0x7fcdba0b8e80>
> <environment: namespace:base>



More information about the R-help mailing list