[R] More simple implementation is slow.

Sat Jun 9 13:07:51 CEST 2012

On Jun 9, 2012, at 11:08 , wl2776 wrote:

> Hi all.
> I'm developing a function, which must return a square matrix.
> 
> Here is the code:
> http://pastebin.com/THzEW9N7
> 
> These functions implement an analog of two embedded for cycles.
> 
> The first variant creates the resulting matrix by columns, cbind()-ing them
> one by one.
> The second variant creates the matrix with two columns, which rows contain
> all possible 
> variants of i and j and calls apply on them.
> 
> The test input (data frame cp.table) can be produced with the following
> commands:
>> n<-132
>> cpt<-data.frame(x=runif(n, min=0, max=100), y=runif(n, min=0, max=100),
>> la=runif(n, min=0, max=360), phi=runif(n, min=-90, max=90))
> Any random data will do.
> 
> The second variant seems to me much more readable and beauteful.
> However, the first ugly variant runs much faster.
> Why??
> Here are the profiles:
> 

Nope, they weren't...

Anyways, you're effectively looping over N^2 (i,j) combinations, with complex indexing all the way, without making proper use of vectorization. As far as I can tell, what you're doing is effectively

with(cp.table,
  sqrt(outer(x, x, "-")^2 + outer(y, y, "-")^2)
)

or even

dist(cptable[1:2])

both of which should be a good deal faster.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com