[R] which() vs. just logical selection in df

1/k^c kch@mber|n @end|ng |rom gm@||@com
Sun Oct 11 01:24:40 CEST 2020


Hi R-helpers,

Does anyone know why adding which() makes the select call more
efficient than just using logical selection in a dataframe? Doesn't
which() technically add another conversion/function call on top of the
logical selection? Here is a reproducible example with a slight
difference in timing.

# Surrogate data - the timing here isn't interesting
urltext <- paste("https://drive.google.com/",
                 "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7",
                 "-h8&export=download", sep="")
download.file(url=urltext, destfile="tempfile.csv") # download file first
dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE, header=TRUE,
                  nrows=2.5e6) # read the file; 'nrows' is a slight
                                         # overestimate
dat <- dat[,1:3] # select just the first 3 columns
head(dat, 10) # print the first 10 rows

# Select using which() as the final step ~ 90ms total time on my macbook air
system.time(
  head(
    dat[which(dat$gender2=="other"),],),
  gcFirst=TRUE)

# Select skipping which() ~130ms total time
system.time(
  head(
    dat[dat$gender2=="other", ]),
  gcFirst=TRUE)

Now I would think that the second one without which() would be more
efficient. However, every time I run these, the first version, with
which() is more efficient by about 20ms of system time and 20ms of
user time. Does anyone know why this is?

Cheers!
Keith



More information about the R-help mailing list