[R] speeding up "sum of squared differences" calculation

Tue Oct 22 15:00:42 CEST 2013

> I am using a sum of squared differences in the objective function of an
> optimization problem I am doing and I have managed to speed it up using
> the outer function versus the nested for loops, but my suspicion is
> that the calculation could be done even quicker.  Please see the code
> below for a simple example.  If anyone can point out a faster way I
> would appreciate it greatly.

First, look at ?system.time ; it's intended for finding out how much compute time a command takes. You're also calculating elapsed time which is dependent on (among other things) other processes running so maybe not the most reliable thing for benchmarking. 

Second, outer() is fast because it avoids loops and uses the fastest vector routines R can (essentially, replicating the vector and then relying on simple subtraction of vectors). In principle calling outer() loses a little time in internal checks on what object types and function you've provided etc.; in practice that doesn't add up to much. You can check by using just the core code from outer with an added multiplication instead of the dimensioning statement:

ssqdif <- function(X, Y=X) {
      #From 'outer' without modification
	Y <- rep(Y, rep.int(length(X), length(Y)))
	X <- rep(X, times = ceiling(length(Y)/length(X)))
   	#For this case:
	sum((X-Y)^2) #SLIGHTLY quicker than d<-X-Y; sum(d*d)
}

system.time(ssqdif(X))

#Comparing outer() method:
system.time({gg <- outer(X, X, FUN="-"); sum(gg*gg)})

There's little practical difference; both hover from 0.00 to 0.03 s system time. I could barely tell the difference even averaged over 100 runs; I was getting an average around 0.007 (system time) and 2.5s user time for both methods.

Conclusion: hard to beat outer() for this purpose in R

S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}