[R] Timings of function execution in R [was Re: R in Industry]

Fri Feb 9 23:13:49 CET 2007

>>>>> "Duncan" == Duncan Murdoch <murdoch at stats.uwo.ca>
>>>>>     on Fri, 09 Feb 2007 13:52:25 -0500 writes:

    Duncan> On 2/9/2007 1:33 PM, Prof Brian Ripley wrote:
    >>> x <- rnorm(10000)
    >>> system.time(for(i in 1:1000) pmax(x, 0))
    >> user  system elapsed
    >> 4.43    0.05    4.54
    >>> pmax2 <- function(k,x) (x+k + abs(x-k))/2
    >>> system.time(for(i in 1:1000) pmax2(x, 0))
    >> user  system elapsed
    >> 0.64    0.03    0.67
    >>> pm <- function(x) {z <- x<0; x[z] <- 0; x}
    >>> system.time(for(i in 1:1000) pm(x))
    >> user  system elapsed
    >> 0.59    0.00    0.59
    >>> system.time(for(i in 1:1000) pmax.int(x, 0))
    >> user  system elapsed
    >> 0.36    0.00    0.36
    >> 
    >> So at least on one system Thomas' solution is a little faster, but a 
    >> C-level n-args solution handling NAs is quite a lot faster.

    Duncan> For this special case we can do a lot better using

    Duncan> pospart <- function(x) (x + abs(x))/2

Indeed, that's what I meant when I talked about doing the
special case 'k = 0' explicitly -- and also what my timings
where based on.

Thank you Duncan -- and Brian for looking into providing an even
faster and more general C-internal version!
Martin

    Duncan> The less specialized function

    Duncan> pmax2 <- function(x,y) {
    Duncan> diff <- x - y
    Duncan> y + (diff + abs(diff))/2
    Duncan> }

    Duncan> is faster on my system than pm, but not as fast as pospart:

    >> system.time(for(i in 1:1000) pm(x))
    Duncan> [1] 0.77 0.01 0.78   NA   NA
    >> system.time(for(i in 1:1000) pospart(x))
    Duncan> [1] 0.27 0.02 0.28   NA   NA
    >> system.time(for(i in 1:1000) pmax2(x,0))
    Duncan> [1] 0.47 0.00 0.47   NA   NA

    Duncan> Duncan Murdoch

    >> 
    >> On Fri, 9 Feb 2007, Martin Maechler wrote:
    >> 
    >>>>>>>> "TL" == Thomas Lumley <tlumley at u.washington.edu>
    >>>>>>>> on Fri, 9 Feb 2007 08:13:54 -0800 (PST) writes:
    >>> 
    TL> On 2/9/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
    >>> >>> The other reason why pmin/pmax are preferable to your functions is that
    >>> >>> they are fully generic.  It is not easy to write C code which takes into
    >>> >>> account that <, [, [<- and is.na are all generic.  That is not to say that
    >>> >>> it is not worth having faster restricted alternatives, as indeed we do
    >>> >>> with rep.int and seq.int.
    >>> >>>
    >>> >>> Anything that uses arithmetic is making strong assumptions about the
    >>> >>> inputs.  It ought to be possible to write a fast C version that worked for
    >>> >>> atomic vectors (logical, integer, real and character), but is there
    >>> >>> any evidence of profiled real problems where speed is an issue?
    >>> 
    >>> 
    TL> I had an example just last month of an MCMC calculation where profiling showed that pmax(x,0) was taking about 30% of the total time.  I used
    >>> 
    TL> function(x) {z <- x<0; x[z] <- 0; x}
    >>> 
    TL> which was significantly faster. I didn't try the
    TL> arithmetic solution.
    >>> 
    >>> I did - eons ago as mentioned in my message earlier in this
    >>> thread. I can assure you that those (also mentioned)
    >>> 
    >>> pmin2 <- function(k,x) (x+k - abs(x-k))/2
    >>> pmax2 <- function(k,x) (x+k + abs(x-k))/2
    >>> 
    >>> are faster still, particularly if you hardcode the special case of k=0!
    >>> {that's how I came about these:  pmax(x,0) is also denoted  x_+, and
    >>> x_+ := (x + |x|)/2
    >>> x_- := (x - |x|)/2
    >>> }
    >>> 
    TL> Also, I didn't check if a solution like this would still
    TL> be faster when both arguments are vectors (but there was
    TL> a recent mailing list thread where someone else did).
    >>> 
    >>> indeed, and they are faster.
    >>> Martin
    >>> 
    >>