[R] Distances between two datasets of x and y co-ordinates

Sundar Dorai-Raj sundar.dorai-raj at pdf.com
Wed Mar 12 22:24:39 CET 2008



Andrew McFadden said the following on 3/12/2008 1:47 PM:
> Hi all
> 
> I am trying to determine the distances between  two datasets of x and y
> points. The number of points in dataset One is very small i.e. perhaps
> 5-10. The number of points in dataset Two is likely to be very large
> i.e. 20,000-30,000. My initial approach was to append the first dataset
> to the second and then carry out the calculation:
> 
> dists <- as.matrix(dist(gis data from 2 * datasets)) 
> 
> However, the memory of the computer is not sufficient. A lot of
> calculations carried out in this situation are unnecessary as I only
> want approx 5 * 20,000 calculations versus 20,000 *20,000. 
> 
> x <- c(2660156,2663703,2658165,2659303,2661531,2660914)
> y <- c(6476767,6475013,6475487,6479659,6477004,6476388)
> data2<-cbind(x,y)
> 
> x <- c(266500,2611111)
> y <- c(6478767,6485013)
> data1<-cbind(x,y)
> 
> Any suggestions on how to do this would be appreciated.
> 
> Regards
> 
> Andrew

If you're trying to find only the closest point in data1 to data2, then 
use knn (or knn1) in the 'class' package:

library(class)
nn <- knn1(data2, data1, 1:nrow(data2))

which gives you the rows in data1 closest to each row in data2. Then 
compute the distance:

rowSums((data2[nn, ] - data1)^2)^0.5

HTH,

--sundar



More information about the R-help mailing list