[R] Spatial join – optimizing code

Monica Pisica pisicandru at hotmail.com
Tue Sep 16 18:23:33 CEST 2008


Hi,

Few days ago I have asked about spatial join on the minimum distance between 2 sets of points with coordinates and attributes in 2 different data frames.

Simon Knapp sent code to do it when calculating distance on a sphere using lat, long coordinates and I've change his code to use Euclidian distances since my data had UTM coordinates. 

Typically one data frame has around 30 000 points and the classification data frame has around 4000 points, and the aim is to add to each point from the first data frame all the attributes from the second data frame of the point that is closest to it. 

On my PC (Dell, OptiPlex GX620, X86 – based PC, 4 GB RAM, 3192 Mhz processor)
It took quite a long time to do the join:

   user  system   elapsed 
8166.07	2.98  8194.43

Sys.info()
                     sysname                      release 
                   "Windows"                         "XP" 
                     version                     nodename 
"build 2600, Service Pack 2"              
                     machine                        
                       "x86"                       
I am running R 2.7.1 patched.
I wonder if any of you can suggest or help (or have time) in optimizing this code to make it run faster. My programming skills are not high enough to do it.

Thanks,

Monica

#### code follows:
#### x a data frame with over 30000 points with coord in UTM, xeast, xnorth
#### y a data frame with over 4000 points with UTM coord (yeast, ynorth) and 
##### classification
### calculating Euclidian distance

dist <- function(xeast, xnorth, yeast, ynorth) {
((xeast-yeast)^2 + (xnorth-ynorth)^2)^0.5
}

### doing the merge by location with minimum distance

dist.merge <- function(x, y, xeast, xnorth, yeast, ynorth){
tmp <- t(apply(x[,c(xeast, xnorth)], 1, function(x, y){
dists <- apply(y, 1, function(x, y) dist(x[2],
x[1], y[2], y[1]), x)
cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
}
, y[,c(yeast, ynorth)]))
tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(yeast,
ynorth), names(y))])
row.names(tmp) <- NULL
tmp
}

#### code end

_________________________________________________________________

 Live.



More information about the R-help mailing list