[R] Vectorizing integrate()

Fri Dec 7 19:37:06 CET 2012

On 12/7/2012 9:40 AM, Berend Hasselman wrote:
> On 07-12-2012, at 18:12, Spencer Graves wrote:
>
>>       Has anyone suggested using the byte code compiler "compiler" package?  An analysis by John Nash suggested to me that it may be roughly equivalent to vectorization;  see "http://rwiki.sciviews.org/doku.php?id=tips:rqcasestudy&s=compiler".
>>
>>
> Not yet.
> But here are some results for alternative ways of doing what  the OP wanted.
>
>
> # Initial parameters
>
> N <- 1000
> B <- c(0,1)
> sem1 <- runif(N, 1, 2)
> x <- rnorm(N)
> X <- cbind(1, x)
>
> # load compiler package
>
> library(compiler)
>
> # functions
>
> # Original loop solution with function fun defined outside loop
>
> f1 <- function(X, B, x, sem1) {
>      eta <- numeric(nrow(X))
>      fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)
>      for(j in 1:nrow(X)){
>      	eta[j] <- integrate(fun, -Inf, Inf, m=x[j], s=sem1[j])$value
>      }
>      eta
> }
> f2 <- cmpfun(f1)
>
> # sapply solution with fun defined outside function
>
> fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)
>
> f3 <- function(X, B, x, sem1) sapply(1:nrow(X), function(i) integrate(fun, -Inf, Inf, m=x[i], s=sem1[i])$value)
> f4 <- cmpfun(f3)
>
> # sapply solution with fun defined within function
>
> f5 <- function(X, B, x, sem1) {
>      fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)
>      sapply(1:nrow(X), function(i) integrate(fun, -Inf, Inf, m=x[i], s=sem1[i])$value)
> }
> f6 <- cmpfun(f5)
>
> # Testing
>
> eta1 <- f1(X, B, x, sem1)
> eta2 <- f2(X, B, x, sem1)
> eta3 <- f3(X, B, x, sem1)
> eta4 <- f4(X, B, x, sem1)
> eta5 <- f5(X, B, x, sem1)
> eta6 <- f6(X, B, x, sem1)
>
> identical(eta1,eta2)
> identical(eta1,eta3)
> identical(eta1,eta4)
> identical(eta1,eta5)
>
> library(rbenchmark)
>
> benchmark(eta1 <- f1(X, B, x, sem1), eta2 <- f2(X, B, x, sem1), eta3 <- f3(X, B, x, sem1),
>            eta4 <- f4(X, B, x, sem1), eta5 <- f5(X, B, x, sem1), eta6 <- f6(X, B, x, sem1),
>            replications=10, columns=c("test","elapsed","relative"))
>
>
> # Results
>
>> identical(eta1,eta2)
> [1] TRUE
>> identical(eta1,eta3)
> [1] TRUE
>> identical(eta1,eta4)
> [1] TRUE
>> identical(eta1,eta5)
> [1] TRUE
>> library(rbenchmark)
>>
>> benchmark(eta1 <- f1(X, B, x, sem1), eta2 <- f2(X, B, x, sem1), eta3 <- f3(X, B, x, sem1),
> +           eta4 <- f4(X, B, x, sem1), eta5 <- f5(X, B, x, sem1), eta6 <- f6(X, B, x, sem1),
> +           replications=10, columns=c("test","elapsed","relative"))
>                         test elapsed relative
> 1 eta1 <- f1(X, B, x, sem1)   1.873    1.207
> 2 eta2 <- f2(X, B, x, sem1)   1.552    1.000
> 3 eta3 <- f3(X, B, x, sem1)   1.807    1.164
> 4 eta4 <- f4(X, B, x, sem1)   1.841    1.186
> 5 eta5 <- f5(X, B, x, sem1)   1.852    1.193
> 6 eta6 <- f6(X, B, x, sem1)   1.601    1.032
>
> As you can see using the compiler package is beneficial speedwise.
> f2 and f6, both the the result of using the compiler package, are the quickest.
> It's quite likely that more can be eked out of this.


       So the compiler (f2, f4, f6) provided a slight improvement over 
f1 and f3 but not f2, and in any event, the improvement was not great.


       Spencer
>
> Berend
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>