[Rd] wish list: generalized apply

John P. Nolan jpnolan at american.edu
Fri Dec 9 02:00:22 CET 2016

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Thursday, December 8, 2016 4:59 PM
To: John P. Nolan <jpnolan at american.edu>
Cc: Charles C. Berry <R-devel at r-project.org>
Subject: Re: [Rd] wish list: generalized apply

> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at american.edu> wrote:
> Dear All,
> I regularly want to "apply" some function to an array in a way that the arguments to the user function depend on the index on which the apply is working.  A simple example is:
> A <- array( runif(160), dim=c(5,4,8) ) x <- matrix( runif(32), nrow=4, 
> ncol=8 ) b <- runif(8)
> f1 <- function( A, x, b ) { sum( A %*% x ) + b } result <- rep(0.0,8) 
> for (i in 1:8) {  result[i] <- f1( A[,,i], x[,i] , b[i] ) }
> This works, but is slow.  I'd like to be able to do something like:
>    generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and b[i] to FUN in addition to A[,,i].  
> Does such a generalized.apply already exist somewhere?  While I can write a C function to do a particular case, it would be nice if there was a fast, general way to do this.  

I would have thought that this would achieve the same result:

result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )


result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

(I doubt it will be any faster, but if 'i' is large, parallelism might help. The inner function appears to be fairly efficient.)

David Winsemius
Alameda, CA, USA


Thanks for the response.  I gave a toy example with 8 iterations to illustrate the point,  so I thought I would bump it up to make my point about speed.  But to my surprise, using a 'for' loop is FASTER than using 'sapply' as David suggest or even 'apply'  on a bit simpler problem.   Here is the example:

n <- 800000; m <- 10; k <- 10
A <- array( 1:(m*n*k), dim=c(m,k,n) )
y <- matrix( 1:(k*n), nrow=k, ncol=n )
b <- 1:n
f1 <- function( A, y, b ) { sum( A %*% y ) + b }

# use a for loop
time1 <- system.time( {
result <- rep(0.0,n)
for (i in 1:n) {
  result[i] <- f1( A[,,i], y[,i] , b[i] )
result } )

#  use sapply
time2 <- system.time( result2 <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], y[,i] , b[i] )} ))

# fix y and b, and use standard apply
time3 <- system.time( result3 <- apply( A, MARGIN=3, FUN=f1, y=y[,1], b=b[1] ) ) 

# user times, then ratios of user times
c( time1[1], time2[1],time3[1]); c( time2[1]/time1[1], time3[1]/time1[1] )  
#   4.84      5.22      5.32 
#   1.078512  1.099174

So using a for loop saves 8-10% of the execution time as compared to sapply and apply!?  Years ago I experimented and found out I could speed things up noticeably by replacing loops with apply.  This is no longer the case, at least in this simple experiment.  Is this a result of byte code?  Can someone tell us when a for loop is going to be slower than using apply?  A more complicated loop that computes multiple quantities?  


More information about the R-devel mailing list