[R] Using and abusing %>% (was Re: Why can't I access this type?)

Hadley Wickham h.wickham at gmail.com
Sat Mar 28 05:40:21 CET 2015


> I didn't dispute whether '%>%' may be useful -- I just pointed out that it
> is slow.  However, it is only part of the problem: 'filter()' and
> 'select()', although aesthetically pleasing, also seem to be slow:
>
>> all.states <- data.frame(state.x77, Name = rownames(state.x77))
>>
>> f1 <- function()
> +     all.states[all.states$Frost > 150, c("Name", "Frost")]
>>
>> f2 <- function()
> +     subset(all.states, Frost > 150, select = c("Name", "Frost"))
>>
>> f3 <- function() {
> +     filt <- subset(all.states, Frost > 150)
> +     subset(filt, select = c("Name", "Frost"))
> + }
>>
>> f4 <- function()
> +     all.states %>% subset(Frost > 150) %>%
> +         subset(select = c("Name", "Frost"))
>>
>> f5 <- function()
> +     select(filter(all.states, Frost > 150), Name, Frost)
>>
>> f6 <- function()
> +     all.states %>% filter(Frost > 150) %>% select(Name, Frost)
>>
>> mb <- microbenchmark(
> +     f1(), f2(), f3(), f4(), f5(), f6(),
> +     times = 1000L
> + )
>> print(mb, signif = 3L)
> Unit: microseconds
>  expr min   lq      mean median   uq  max neval   cld
>  f1() 115  124  134.8812    129  134 1500  1000 a
>  f2() 128  141  147.4694    145  151 1520  1000 a
>  f3() 303  328  344.3175    338  348 1740  1000  b
>  f4() 458  494  518.0830    510  523 1890  1000   c
>  f5() 806  848  887.7270    875  894 3510  1000    d
>  f6() 971 1010 1056.5659   1040 1060 3110  1000     e
>
> So, using '%>%', but leaving 'filter()' and 'select()' out of the equation,
> as in 'f4()' is only half as bad as the "full" 'dplyr' idiom in 'f6()'.  In
> this case, since we're talking microseconds, the speed-up is negligible but
> that *is* beside the point.

When benchmarking it's important to consider both the relative and
absolute difference and to think about how the cost scales as the data
grows - the cost of using using %>% is fixed, and 500 µs doesn't seem
like a huge performance penalty to me.

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list