[R] Vectorizing closest match

Mike Lonergan mel at mcs.st-and.ac.uk
Thu Mar 28 17:55:27 CET 2002


Frank,

Apologies if I'm being very stupid, but I'd guess that reducing the overall
complexity would help for large datasets. The following is not vectorised,
but should be linear apart from the order() operations, which are hopefully
O(nlogn):


>closest<-function(w,x,y)  {
   ow<-order(w)
   ox<-order(x)
   ws<-w[ow]
   xs<-x[ox]
   ys<-y[ox]
   res<-rep(length(x),length(w))
   i<-1
   j<-1

   while(i<=length(w) && j<length(x))   {
      if ( ws[i]<=(xs[j]+xs[j+1])/2 )  {
         res[i]<-j
         i<-i+1
      } else
         j<-j+1
   }
   reshuffle<-order((1:length(w))[ow])
   z<-ys[res[reshuffle]]
   return(z)
 }


It is all based on sorting the data first so that only m+n comparisons are
needed rather than mn.

I'd be glad to know whether I've messed up/completely misunderstood the
problem (& I've just realised you're more interested in space than time so
this may all be completely irrelevant.)

Cheers,

Mike.


     > -----Original Message-----
     > From: owner-r-help at stat.math.ethz.ch
     > [mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of Frank E
     > Harrell Jr
     > Sent: 28 March 2002 13:20
     > To: rhelp
     > Subject: [R] Vectorizing closest match
     >
     >
     > If anyone has a very fast vectorized method for doing the
     > following I would appreciate some help.  I want to avoid
     > outer() to limit memory problems for very large n.
     >
     > Let
     >
     > x = real vector of length n
     > y = real vector of length n
     > w = real vector of length m, m typically less than n/2 but can be > n
     > z = real vector of length m
     >
     > For w[i], i=1,,,m, find the value of x that is closest to
     > w[i].  In the case of ties, select one (optimally at random
     > or just take the first match).  Let z[i] = value of y
     > corresponding to the closest x.
     >
     > If there is a Fortran or C function (builtin or otherwise)
     > that does much of the work, all the better.
     >
     > Thanks,
     >
     > Frank
     > --
     > Frank E Harrell Jr              Prof. of Biostatistics & Statistics
     > Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
     > U. Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list