[R] replacing a for-loop with lapply

Mon May 9 18:31:50 CEST 2005

Dear All,

I am trying to compute a goodness-of-fit statistic for a copula, based on an
empirical density estimate of this copula. 
To do this I can use the following code:

> n <- dim(data)[1]
> d <- dim(data)[2]
> Chat <- rep(0,n)
> for(i in 1:n)
+ Chat[i] <- sum(apply(t(data)<=data[i,],2,prod))/(n+1)

However, I have a feeling this can be done more effectively than using a
for-loop. I have also tried the following:

> tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
> tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
> Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))

but there is no improvement. I ran the following timing test:

> data <- matrix(runif(300),100,3)
> n = dim(data)[1]
> d = dim(data)[2]
> Chat = vector("numeric",n)
> M <- 30
> a <- rep(0,M)
> for(m in 1:M){
+ a[m] <- system.time({
+ tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
+ tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
+ Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))})[3]}
> b <- rep(0,M)
> for(m in 1:30){
+ b[m] <- system.time(	
+ for (i in 1:n)
+ Chat[i] = sum(apply(t(data)<=data[i,],2,prod))/(n+1))[3]}
> summary(a)
> summary(b)

and the output was:

> summary(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8500  0.8700  0.8900  0.9013  0.9300  0.9800 
> summary(b)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8400  0.8600  0.8800  0.8883  0.9075  0.9900

Is there any way I can code this more efficiently in R or will I have to
turn to C? The data sets, on which I am actually going to run this code,
will be of sizes up to (5000x100) and I need hundreds of realizations...

Thank you for your time.

Rgds,
Daniel