[R] Speedups with Ra and jit

Luke Tierney luke at stat.uiowa.edu
Sun May 4 20:18:21 CEST 2008


A couple of comments on this and the original thread.

As pointed out by several posters, in a vectorized language like R one
can usually create the fastest and cleanest code by using vectorized
operations.  This applies to R as well as Matlab.

That said, there are at times reasons for using code consisting of a
set of tight nested loops, such as

     it may not be obvious how to rewrite the loops using
     vectorization

     the code may match a published algorithm and so verifying its
     correctness and maintaining it in this form may be easier than
     rewriting

     the code may not be fast or pretty but it works and gets the right
     answer

Given these points it would be nice if R's interpreter could do a
better job.  R's interpreter is currently rather slow on these sorts
of computations.  For example, a direct translation of the original
poster's grw_permute example runs about 5 times as fast in the
xlispstat interpreter (similar internals in many ways).  There are
some things we could do involving relatively little effort to
marginally improve R's interpreter, but significant improvements would
likely require a fairly significant rewrite (which may be worth
considering at some point but probably not soon).

An alternative is to pursue automated tools to transform this sort of
code into faster code.  This is in part the motivation for creating a
byte code compiler for R, and for the Ra/jit project.  The current
byte code compiler (http://www.stat.uiowa.edu/~luke/R/compiler/)
speeds up this computation by about a factor of 3.  This compiler does
not yet optimize variable lookup or indexing of vectors or matrices;
these should be added sometime this summer and should add another
factor of 3 or so for this example.

Steve has done some very interesting things in his Ra/jit project.
Some of the optimizations done there are either already done in the
byre code compiler or have been planned to be added for a while.
Others, in particular specialization for basic data types may be best
done at run time as jit does, and there may be room for merging these
ideas with the static byte code compiler.

The particular specialization approach used in jit means that some
code will produce different results or generate errors; i.e. a user
who requests jit compilation is implicitly agreeing not to try to do
certain things, such as change types or sizes of values stored in
variables used in the compilation.  In the long run I would prefer
either a mechanism where such assumptions are declared explicitly by
the user or to arrange for R to automatically switch back to less
optimized code when the assumptions of the optimization are violated.

I believe the main aspect of runtime specialization done in jit now
that may be hard to match in statically compiled byte code is that
a function defined as

     f <- function(x) {
         jit()
         s <- 0
         for (i in seq_along(x)) s <- s + x[i]
         s
     }

will be optimized for integer data when run on integer x and on for
real data when run on real x, and in both cases allocation of
intermediate results is avoided.  How valuable this is in the long run
is not yet clear -- it would definitely be very helpful if machine
code was being generated that also allowed intermediate values to stay
in registers (which I believe is what psyco does), but that is messy
and hard to do across many platforms.  With fast allocation the
benefits of avoiding allocation alone may not be that substantial.
For example, the byte compiled xlispstat version of the grw_protect
example mentioned above runs about twice as fast at the Ra/jit one
without avoiding intermediate allocation.  This isn't conclusive of
course and it will be interesting to do some more careful tests and
see what directions those suggest.

Best,

luke


On Fri, 2 May 2008, Bill.Venables at csiro.au wrote:

> The topic of Ra and jit has come up on this list recently
>
> (see http://www.milbo.users.sonic.net/ra/index.html)
>
> so I thought people might be interested in this little demo. For it I
> used my machine, a 3-year old laptop with 2Gb memory running Windows
> XP, and the good old convolution example, the same one as used on the
> web page, (though the code on the web page has a slight glitch in it).
>
> This is using Ra with R-2.7.0.
>
>> conv1 <- function(a, b) {
>> ### with Ra and jit
>      require(jit)
>      jit(1)
>      ab <- numeric(length(a)+length(b)-1)
>      for(i in 1:length(a))
>        for(j in 1:length(b))
>          ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]
>      ab
>    }
>>
>> conv2 <- function(a, b) {
>> ### with just Ra
>      ab <- numeric(length(a)+length(b)-1)
>      for(i in 1:length(a))
>        for(j in 1:length(b))
>          ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]
>      ab
>    }
>>
>> x <- 1:2000
>> y <- 1:500
>> system.time(tst1 <- conv1(x, y))
>   user  system elapsed
>   0.53    0.00    0.55
>> system.time(tst2 <- conv2(x, y))
>   user  system elapsed
>   9.49    0.00    9.56
>> all.equal(tst1, tst2)
> [1] TRUE
>>
>> 9.56/0.55
> [1] 17.38182
>>
>
> However for this example you can achieve speed-ups like that or better
> just using vectorised code intelligently:
>
>> conv3 <- local({
>      conv <- function(a, b, na, nb) {
>        r <- numeric(na + nb -1)
>        ij <- 1:nb
>        for(e in a) {
>          r[ij] <- r[ij] + e*b
>          ij <- ij + 1
>        }
>        r
>      }
>
>      function(a, b) {
>        na <- length(a)
>        nb <- length(b)
>        if(na < nb) conv(a, b, na, nb) else
>                    conv(b, a, nb, na)
>      }
>    })
>>
>> system.time(tst3 <- conv3(x, y))
>   user  system elapsed
>   0.11    0.00    0.11
>> all.equal(tst1, tst3)
> [1] TRUE
>
>> 0.55/0.11
> [1] 5
>> 9.56/0.11
> [1] 86.90909
>
> ie. a further 5-fold increase in speed, or about 87 times faster than
> the unassisted naïve code.
>
> I think the lesson here is if you really want to write R code as you
> might C code, then jit can help make it practical in terms of time.
> On the other hand, if you want to write R code using as much of the
> inbuilt operators as you have, then you can possibly still do things
> better.
>
> Of course sometimes you don't have the right inbuilt operators.  In
> that case you have a three-way choice: slow R code and wait, faster R
> code speeded up with Ra and jit, or, (the way it probably should be
> done), with dynamically loaded C or Fortran code.  Portability
> decreases as you go, of course.
>
>
> Bill Venables
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


More information about the R-help mailing list