[R] Performance enhancement for ave

Hadley Wickham hadley at rice.edu
Tue Jun 29 15:11:25 CEST 2010


On Tue, Jun 29, 2010 at 8:02 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
>> dt = data.table(d,key="grp1,grp2")
>> system.time(ans1 <- dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)])
>   user  system elapsed
>   3.89    0.00    3.91        # your 7.064 is 12.23 for me though, so this
> 3.9 should be faster for you
>
> However, Rprof() shows that 3.9 is mostly dispatch of mean to mean.default
> which then calls .Internal.  Because there are so many groups here, dispatch
> bites.
>
> So ...
>
>> system.time(ans2 <- dt[ , list(.Internal(mean(x)),.Internal(mean(y))),
>> by=list(grp1,grp2)])
>   user  system elapsed
>   0.20    0.00    0.21

Of course, we can perform the same optimisation with ave:

fast_mean <- function(x) .Internal(mean(x))
system.time({
  d$avx <- ave(d$x, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean)
  d$avy <- ave(d$y, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean)
})
#  user  system elapsed
# 3.109   0.188   3.302

Regardless, my point is that there's a simple fix available to make
ave much faster, not that it's the fastest thing out there.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/



More information about the R-help mailing list