[Rd] cache most-recent dispatch

John Chambers jmc at r-project.org
Tue Jul 2 17:41:51 CEST 2013


It's hard to see how repeated dispatch on the same classes can be that 
slow, _if_ the function being called each time is itself doing some 
substantial work.

The first call (in a session) with a particular signature searches for 
inherited methods and stores the method found in a table.  The following 
calls with that signature should do a single lookup in a hash table. 
Caching the last signature is unlikely to be dramatically faster, but we 
can experiment and see.

What is substantially different is calling a generic function vs calling 
a primitive or internal.  If the local paste you constructed is the 
default, base::paste, that is a .Internal.

Not going through the R generic function several thousand times would 
make a difference.

It's a fundamental point about R that function calls do enough work that 
they add significant time to a "trivial" computation, such as a 
primitive call.  There are various efforts going on these days to 
provide more efficient alternatives.  They're all helpful; my personal 
favorite when the game is worth it is to consider doing key computations 
in a seriously faster language, like C++ via Rcpp.

John

On 7/1/13 10:04 PM, Valerie Obenchain wrote:
> Hi,
>
> S4 method dispatch can be very slow. Would it be reasonable to cache the
> most
> recent dispatch, anticipating the next invocation will be on the same
> type? This
> would be very helpful in loops.
>
>    fun0 <- function(x)
>        sapply(x, paste, collapse="+")
>    fun1 <- function(x) {
>        paste <- selectMethod(paste, class(x[[1]]))
>        sapply(x, paste, collapse="+")
>    }
>    lst <- split(rep(LETTERS, 100), rep(1:1300, 2))
>
>    library(microbenchmark)
>    microbenchmark(fun0(lst), times=10)
>    ## Unit: milliseconds
>    ##       expr      min       lq   median      uq      max neval
>    ##  fun0(lst) 4.153287 4.180659 4.513539 5.19261 5.280481    10
>
>    setGeneric("paste")
>    microbenchmark(fun0(lst), fun1(lst), times=10)
>    ## >     microbenchmark(fun0(lst), fun1(lst), times=10)
>    ## Unit: milliseconds
>    ##       expr       min       lq    median        uq       max neval
>    ##  fun0(lst) 21.093180 21.27616 21.453174 21.833686 24.758791    10
>    ##  fun1(lst)  4.517808  4.53067  4.582641  4.682235  5.121856    10
>
> Dispatch seems to be especially slow when packages are involved, e.g.,
> with the Bioconductor IRanges package
> (http://bioconductor.org/packages/release/bioc/html/IRanges.html)
>
>    removeGeneric("paste")
>    library(IRanges)
>    showMethods(paste)
>    ## Function: paste (package BiocGenerics)
>    ## ...="ANY"
>    ## ...="Rle"
>    selectMethod(paste, "ANY")
>    ## Method Definition (Class "derivedDefaultMethod"):
>    ##
>    ## function (..., sep = " ", collapse = NULL)
>    ## .Internal(paste(list(...), sep, collapse))
>    ## <environment: namespace:base>
>    ##
>    ## Signatures:
>    ##         ...
>    ## target  "ANY"
>    ## defined "ANY"
>
>    microbenchmark(fun0(lst), fun1(lst), times=10)
>    ## Unit: milliseconds
>    ##       expr        min         lq     median         uq        max
> neval
>    ##  fun0(lst) 233.539585 234.592491 236.311209 237.268506 243.181123
>     10
>    ##  fun1(lst)   4.564914   4.592996   4.642898   4.729009   5.492706
>     10
>
>    sessionInfo()
>    ## R version 3.0.0 Patched (2013-04-04 r62492)
>    ## Platform: x86_64-unknown-linux-gnu (64-bit)
>    ##
>    ## locale:
>    ##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>    ##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>    ##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>    ##  [7] LC_PAPER=C                 LC_NAME=C
>    ##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>    ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>    ##
>    ## attached base packages:
>    ## [1] parallel  stats     graphics  grDevices utils     datasets
> methods
>    ## [8] base
>    ##
>    ## other attached packages:
>    ## [1] IRanges_1.19.15      BiocGenerics_0.7.2   microbenchmark_1.3-0
>    ##
>    ## loaded via a namespace (and not attached):
>    ## [1] stats4_3.0.0
>
>
> Thanks,
> Valerie
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list