[R] Join data by minimum distance

Simon Knapp sleepingwell at gmail.com
Sun Sep 14 15:06:22 CEST 2008


> I am wondering if there is a function which will do a join between 2 data.frames by minimum distance, as it is done in ArcGIS for example. For people who are not familiar with ArcGIS here it is an explanation:
>
> Suppose you have a data.frame with x, y, coordinates called track, and a second data frame with different x, y coordinates and some other attributes called classif. The track data.frame has a different number of rows than classif. I want to join the rows from classif to track in such a way that for each row in track I add only the row from classif that has coordinates closest to the coordinates in the track row (and hence minimum distance in between the 2 rows), and also add a new column which will record this minimum distance. Even if the coordinates in the 2 data.frames have same name, the values are not identical between the data.frames, so a merge by column is not what I am after.



#-----------------------------------------------------------------------
# get the distance between two points on the globe.
#
# args:
# lat1 - latitude of first point.
# long1 - longitude of first point.
# lat2 - latitude of first point.
# long2 - longitude of first point.
# radius - average radius of the earth in km
#
# see: http://en.wikipedia.org/wiki/Great_circle_distance
#-----------------------------------------------------------------------
greatCircleDistance <- function(lat1, long1, lat2, long2, radius=6372.795){
    sf <- pi/180
    lat1 <- lat1*sf
    lat2 <- lat2*sf
    long1 <- long1*sf
    long2 <- long2*sf
    lod <- abs(long1-long2)
    radius * atan2(
        sqrt((cos(lat1)*sin(lod))**2 +
(cos(lat2)*sin(lat1)-sin(lat2)*cos(lat1)*cos(lod))**2),
        sin(lat2)*sin(lat1)+cos(lat2)*cos(lat1)*cos(lod)
    )
}

#-----------------------------------------------------------------------
# Calculate the nearest point using latitude and longitude.
# and attach the other args and nearest distance from the
# other data.frame.
#
# args:
# x as you describe 'track'
# y as you describe 'classif'
# xlongnme name of longitude variable in x
# xlatnme name of latitude location variable in x
# ylongnme name of longitude location variable on y
# ylatnme name of latitude location variable on y
#-----------------------------------------------------------------------
dist.merge <- function(x, y, xlongnme, xlatnme, ylongnme, ylatnme){
    tmp <- t(apply(x[,c(xlongnme, xlatnme)], 1, function(x, y){
        dists <- apply(y, 1, function(x, y) greatCircleDistance(x[2],
x[1], y[2], y[1]), x)
        cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
    }
    , y[,c(ylongnme, ylatnme)]))
    tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(ylongnme,
ylatnme), names(y))])
    row.names(tmp) <- NULL
    tmp
}

# demo
track <- data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90))
classif <- data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90),
v1=letters[1:20], v2=1:20)
dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc')



More information about the R-help mailing list