[Rd] Simple performance enhancement for ave

Hadley Wickham hadley at rice.edu
Wed May 5 18:50:40 CEST 2010


n<-100000
grp1<-sample(1:750, n, replace=T)
grp2<-sample(1:750, n, replace=T)
d<-data.frame(x=rnorm(n), y=rnorm(n), grp1=grp1, grp2=grp2)

system.time(ave(d$x, d$grp1, d$grp2, FUN = mean))
#   user  system elapsed
# 19.840   0.125  19.967
system.time(ave(d$x, d$grp1, d$grp2, drop = TRUE, FUN = mean))
#  user  system elapsed
# 2.898   0.058   2.956

This is a pathological example (100,000 observations with around
90,000 groups out of ~500,000 possible), but I don't see any reason
why drop = TRUE shouldn't be the default inside ave.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/



More information about the R-devel mailing list