Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Wed Feb 18 23:51:13 CET 2009

Stavros Macrakis wrote:
> On Thu, Feb 12, 2009 at 4:28 AM, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
>> When I'm testing the speed of things like this (that are in and of themselves
>> very quick) for situations where it may matter, I wrap the function call in a call
>> to replicate():
>> system.time(replicate(1000, svd(Mean_svd_data)))
>> to run it 1000 times, and that allows me to judge how quickly the
>> function executes.
> I do the same, but with a small twist:
>      system.time(replicate(1000, {svd(Mean_svd_data); 0} ))
> This allows the values of svd(...) to be garbage collected.
> If you don't do this and the output of the timed code is large, you
> may allocate large amounts of memory (which may influence your timing
> results) or run out of memory (which will also influence your timing
> results :-) ),

to contribute my few cents, here's a simple benchmarking routine,
inspired by the perl module Benchmark.  it allows one to benchmark an
arbitrary number of expressions with an arbitrary number of
replications, and provides a summary matrix with selected timings.

the code below is also available from google code [1], if anyone is
interested in updates (should there be any) or contributions.

benchmark = function(
      columns=c('test', 'replications', 'user.self', 'sys.self',
'elapsed', 'user.child', 'sys.child'),
      environment=parent.frame()) {
   arguments = match.call()[-1]
   parameters = names(arguments)
   if (is.null(parameters))
      parameters = as.character(arguments)
   else {
      indices = ! parameters %in% c('columns', 'replicate', 'environment')
      arguments = arguments[indices]
      parameters = parameters[indices] }
   result = cbind(
      test=rep(ifelse(parameters=='', as.character(arguments),
parameters), each=length(replicate)),
                              system.time(replicate(count, {
eval(argument, environment); NULL })))))))))
   result[, columns, drop=FALSE] }

it's rudimentary and not fool-proof, but might be helpful if used with
care.  (the nested do.call-rbind-lapply sequence can surely be
simplified, but i could not resist the pattern.  someone once wrote that
if you need more than three (five?) levels of indentation in your code,
there must be something wrong with it;  presumably, he was a fortran


#     test replications user.self sys.self elapsed user.child sys.child
# 1 1:10^7          100     2.168        0   2.166          0         0

benchmark(allocation=1:10^8, replicate=10)
#         test replications user.self sys.self elapsed user.child sys.child
# 1 allocation           10      0.98    3.073    4.05          0         0

means.rep = function(n, m) replicate(n, mean(rnorm(m)))
means.pat = function(n, m) colMeans(array(rnorm(n*m), c(m, n)))
(result = benchmark(replicate=c(10, 100, 1000),
    rep=means.rep(100, 100),
    pat=means.pat(100, 100),
    columns=c('test', 'replications', 'elapsed')))
#   test replications elapsed
# 1  rep           10   0.037
# 2  rep          100   0.387
# 3  rep         1000   3.840
# 4  pat           10   0.017
# 5  pat          100   0.170
# 6  pat         1000   1.731

# [1] 0.003700 0.003870 0.003840 0.001700 0.001700 0.001731

with(result, t.test(elapsed/replications ~ test, paired=TRUE))
# silly, i know...

manual on demand.

[1] http://code.google.com/p/rbenchmark/

