[R] Possible Improvement of the R code

William Dunlap wdunlap at tibco.com
Mon Sep 17 17:52:51 CEST 2012


Unlike C or C++, subscripting vector or matrix is not almost free.
Also, subscripting a matrix tends to make more time than subscripting a
vector so it can help to pull out the previous row of params once
per iteration, use vector subscripting on the that row to build the new
row, and then insert the new row back into params with one matrix
subscripting call:
f5 <- function(param) {
    for(i in 2:11) {
        # do matrix subscripting only twice per iteration
        prev <- param[i-1, ]
        param[i,] <- c(
            prev[3] + prev[4]/2,
            prev[4]/2 + prev[5],
            prev[1] * (prev[3] + prev[4]/2),
            prev[1] * (prev[4]/2 + prev[5]) + prev[2] * (prev[3] + prev[4]/2),
            prev[2] * (prev[4]/2 + prev[5]))
    }
    param
}
You can also do as much as possible to not repeat any subscripting or
sums:
f6 <- function(param) {
    for(i in 2:11) {
        # do any subscripting only twice/iteration and
        # repeated sums only once/iteration.
        prev <- param[i-1, ]
        p1 <- prev[1]
        p2 <- prev[2]
        p42 <- prev[4]/2
        p342 <- prev[3] + p42
        p425 <- p42 + prev[5]
        param[i,] <- c(
            p342,
            p425,
            p1 * p342,
            p1 * p425 + p2 * p342,
            p2 * p425)
    }
    param
}

I was curious about how much the compiler package could replace hand-optimization
so I timed these with and without compiling.  (f1 and f3 are Berend's f1 and f3; I renamed
his f2 and f4 to f1c and f3c to indicate they were the compiled versions.)
> funs <- list(f1=f1, f1c=f1c, f3=f3, f3c=f3c, f5=f5, f5c=f5c, f6=f6, f6c=f6c)
> z <- vapply(funs, function(f)system.time(for(i in 1:10000)f(param))[1:3], numeric(3))
> z
            f1  f1c   f3  f3c   f5  f5c   f6 f6c
user.self 5.03 2.82 3.74 1.89 2.73 1.78 1.54 0.9
sys.self  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
elapsed   5.03 2.83 3.75 1.90 2.76 1.77 1.54 0.9
> matrix(z["user.self", ], byrow=TRUE, ncol=2, dimnames=list(c("f1","f3","f5","f6"), c("raw","compiled")))
    raw compiled
f1 5.03     2.82
f3 3.74     1.89
f5 2.73     1.78
f6 1.54     0.90
It looks like both hand- and machine-optimization help quite a bit here.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Berend Hasselman
> Sent: Sunday, September 16, 2012 9:28 PM
> To: li li
> Cc: r-help
> Subject: Re: [R] Possible Improvement of the R code
> 
> 
> On 17-09-2012, at 00:51, li li wrote:
> 
> > Dear all,
> >   In the following code, I was trying to compute each row of the "param"
> > iteratively based
> > on the first row.
> >   This likely is not the best way. Can anyone suggest a simpler way to
> > improve the code.
> >   Thanks a lot!
> >         Hannah
> >
> >
> > param <- matrix(0, 11, 5)
> >
> > colnames(param) <- c("p", "q", "r", "2s", "t")
> >
> > param[1,] <- c(0.5, 0.5, 0.4, 0.5, 0.1)
> >
> > for (i in 2:11){
> >
> > param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
> >
> > param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
> >
> > param[i,3] <- param[(i-1),1]*(param[(i-1),3]+param[(i-1),4]/2)
> >
> > param[i,4] <- param[(i-1),1]*(param[(i-1),4]/2+param[(i-1),5])+param[(i-1),2
> > ]*(param[(i-1),3]+param[(i-1),4]/2)
> >
> > param[i,5] <- param[(i-1),2]*(param[(i-1),4]/2+param[(i-1),5])
> >
> > }
> >
> 
> You can use the compiler package.
> It also helps if you don't repeat certain calculations. For example (param[(i-
> 1),3]+param[(i-1),4]/2) is computed three times.
> Once is enough.
> 
> See this example where your code has been put in function f1. The simplified code is in
> function f3.
> Functions f2 and f4 are the compiled versions of f1 and f3.
> 
> library(compiler)
> library(rbenchmark)
> param <- matrix(0, 11, 5)
> colnames(param) <- c("p", "q", "r", "2s", "t")
> param[1,] <- c(0.5, 0.5, 0.4, 0.5, 0.1)
> 
> # your calculation
> f1 <- function(param) {
>     for (i in 2:11){
>         param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
>         param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
>         param[i,3] <- param[(i-1),1]*(param[(i-1),3]+param[(i-1),4]/2)
>         param[i,4] <- param[(i-1),1]*(param[(i-1),4]/2+param[(i-1),5])+param[(i-
> 1),2]*(param[(i-1),3]+param[(i-1),4]/2)
>         param[i,5] <- param[(i-1),2]*(param[(i-1),4]/2+param[(i-1),5])
>     }
> 
>     param
> }
> 
> f2 <- cmpfun(f1)
> 
> # modified by replacing identical sub-expressions with result
> f3 <- function(param) {
>     for (i in 2:11){
>         param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
>         param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
>         param[i,3] <- param[(i-1),1]*param[i,1]
>         param[i,4] <- param[(i-1),1]*param[i,2]+param[(i-1),2]*param[i,1]
>         param[i,5] <- param[(i-1),2]*param[i,2]
>     }
> 
>     param
> }
> 
> f4 <- cmpfun(f3)
> 
> z1 <- f1(param)
> z2 <- f2(param)
> z3 <- f3(param)
> z4 <- f4(param)
> 
> Running in R
> 
> > all.equal(z2,z1)
> [1] TRUE
> > all.equal(z3,z1)
> [1] TRUE
> > all.equal(z4,z1)
> [1] TRUE
> >
> > benchmark(f1(param), f2(param), f3(param), f4(param),replications=5000,
> columns=c("test", "replications", "elapsed", "relative"))
>        test replications elapsed relative
> 1 f1(param)         5000   3.748    2.502
> 2 f2(param)         5000   2.104    1.405
> 3 f3(param)         5000   2.745    1.832
> 4 f4(param)         5000   1.498    1.000
> 
> f4 is quite an improvement over f1.
> It's quite possible that more can be gained but I'm too lazy to investigate further.
> 
> Berend
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list