[R] Nearest Neighbor Algorithm in R -- again.

Hans W Borchers Venherm.Borchers at t-online.de
Mon Feb 2 22:25:56 CET 2004

Several of the methods I use for analyzing large data sets, such as

  WinGamma: determining the level of noise in data
  Relief-F: estimating the influence of variables

depend on finding the k nearest neighbors of a point in a data frame or
matrix efficiently. (For large data sets it is not feasible to compute
the 'dist' matrix anyway.)

Seeing the proposed solution to "[R] distance between two matrices"
last month for finding _one_ nearest neighbor I came up with a solution
'nearest(A, n, k)' as appended.

Still, this is very clumsy and slow if you have to find the 3 nearest
neighbors for 1000 points in a data frame with 100000 entries at least
-- about 2 secs per data point on my computer or half an hour for an
application from real life.

Are there better/faster ways to perform this task using R functions?
Even better, is there a free implementation of kd-trees that I could
utilize (the one I found did not conform to the ANSI C standard)?

Someome pointed to 'spdep' of the R-Sig-Geo project, but 'knearneigh'
only accepts 2D data points.

Hans Werner Borchers
ABB Corporate Research, Germany

require (class)
nearest <- function (X, n, k=3)
##  Find k nearest neighbors of X[n, ] in the data frame
##  or matrix X, utilizing function knn1 k-times.
    N <- nrow(X)
    # inds contains the indices of nearest neighbors
    inds <- c(n); i <- 0
    while (i < k) {
        # use knn1 for one index...
        j  <- as.integer(knn1(X [-inds, ], X[n, ], 1:(N-length(inds))))
        # ...and change to true index of neighbor
        inds <- c(inds, setdiff(1:N, inds)[j])
        i <- i+1
    # return nearest neighbor indices (without n, of course)

More information about the R-help mailing list