[R] use loop or use apply?

Prasenjit Kapat kapatp at gmail.com
Fri May 18 01:56:12 CEST 2007


Hi,

I have two matrices, A (axd) and B (bxd). I want to get another matrix C (axb) 
such that, C[i,j] is the Euclidean distance between the ith row of A and jth 
row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]). 
What is the best method for doing so? (assume a < b)

I have been doing some exploration myself: Consider the following function: 
get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2' 
avoids one loop by constructing a bigger matrix, but doesn't use 
apply(); 'method=3' avoids both the loops by using apply() and constructing 
bigger matrices; 'method=4' avoids constructing bigger matrices by using 
apply() twice.

get.f <- function (A, B, method=2) {
	if (method == 1){
		a <- nrow(A); b <- nrow(B);
		C <- matrix(NA, nrow=a, ncol=b);
		for (i in 1:a) 
			for (j in 1:b) 
				C[i,j] <- sum((A[i,]-B[j,])^2)
	} else if (method == 2 ) {
		a <- nrow(A); b <- nrow(B); d <- ncol(A);
		C <- matrix(NA, nrow=a, ncol=b);
		for (i in 1:a) 
			C[i,] <- rowSums((matrix(A[i,], nrow=b, ncol=d, byrow=TRUE) - B) ^ 2)
	} else if (method == 3) {
			C <- t(apply(A, MARGIN=1, FUN="FUN1", BB=B)); # transpose is needed
	} else if (method == 4) {
			C <- t(apply(A, MARGIN=1, FUN="FUN2", BB=B))
	}
}

FUN1 <- function(aa, BB)
  return(rowSums(
		(matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
  )

FUN2 <- function(aa, BB)
	return(apply(BB, MARGIN=1, FUN="FUN3", aa=aa))

FUN3 <- function(bb,aa) return(sum((aa-bb)^2))

### With these methods and the following intitializations,

a <- 100; b <- 1000; d <- 100; n.loop <- 20;

A <- matrix(rnorm(a*d), ncol=d)
B <- matrix(rnorm(b*d), ncol=d)

all.times <- matrix(0,nrow=5,ncol=4)
rownames(all.times) <- rownames(as.matrix(system.time(NULL)))

for (i in 1:4)  
	for (j in 1:n.loop)
		all.times[,i] <- all.times[,i] + 
				as.matrix(system.time(C <- get.f(A=A, B=B, method=i)))

all.times <- all.times / n.loop
print(all.times)

               [,1]    [,2]    [,3]    [,4]
user.self   4.0554 1.50010 1.50130 4.51285
sys.self     0.0370 0.02420 0.01800 0.04260
elapsed    4.2705 1.58865 1.59475 6.07535
user.child 0.0000 0.00000 0.00000 0.00000
sys.child   0.0000 0.00000 0.00000 0.00000

'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4' 
(two apply()s)... Is that expected?

Is it possible to improve over 'method=2'?

Thanks
PK

PS: The mail text seems fine in my composer, I hope, it looks decent in your 
reader.



More information about the R-help mailing list