[R] find closest value in a vector based on another vector values

William Dunlap wdunlap at tibco.com
Tue Jun 18 23:33:39 CEST 2013


> I guess I could have been a little more authoritative: the code
> unique(a[sapply(b,function(x) which.min(abs(x-a)))]) is exactly what I need. 

That method could be written as the following function
f0 <- function (a, b, unique = TRUE) 
{
    ret <- a[sapply(b, function(x) which.min(abs(x - a)))]
    if (unique) { 
        ret <- unique(ret)
    }
    ret
}

If 'a' is in sorted order then I think the following, based on findInterval,
does the same thing in less time, especially when 'b' is longish.
If 'a' may not be sorted then add

f1 <- function (a, b, unique = TRUE) 
{
    leftI <- findInterval(b, a)
    rightI <- leftI + 1
    leftI[leftI == 0] <- 1
    rightI[rightI > length(a)] <- length(a)
    ret <- ifelse(abs(b - a[leftI]) < abs(b - a[rightI]), a[leftI],  a[rightI])
    if (unique) { 
        ret <- unique(ret)
    }
    ret
}

E.g.,

R> a <- sort(rnorm(1e6))
R> b <- sort(rnorm(1000))
R> system.time(r0 <- f0(a, b))
   user  system elapsed 
   4.88    3.48    8.36 
R> system.time(r1 <- f1(a, b))
   user  system elapsed 
      0       0       0 
R> identical(r0, r1)
[1] TRUE

If 'a' might be unsorted then add
    if (is.unsorted(a))  a <- sort(a)
at the beginning.  If the output must be in the same order as the original
'a' then use order(a) and subscript 'a' and 'ret' with its output.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Andras Farkas
> Sent: Tuesday, June 18, 2013 10:24 AM
> To: Bert Gunter
> Cc: R mailing list
> Subject: Re: [R] find closest value in a vector based on another vector values
> 
> Bert,
> 
> I guess I could have been a little more authoritative: the code
> unique(a[sapply(b,function(x) which.min(abs(x-a)))]) is exactly what I need. Thanks for the
> input, your comments helped us make the code better,
> 
> Andras
> 
> 
> --- On Tue, 6/18/13, Bert Gunter <gunter.berton at gene.com> wrote:
> 
> > From: Bert Gunter <gunter.berton at gene.com>
> > Subject: Re: [R] find closest value in a vector based on another vector values
> > To: "Andras Farkas" <motyocska at yahoo.com>
> > Cc: "Jorge I Velez" <jorgeivanvelez at gmail.com>, "R mailing list" <r-help at r-
> project.org>
> > Date: Tuesday, June 18, 2013, 10:55 AM
> > Andras:
> >
> > No.
> > Using the a = c(1,8,9) and b = 2:3 that ** I posted
> > before**,  you get
> > the single unique value of 1.
> >
> > Please stop guessing, think carefully about what you want to
> > do, and
> > **test** your code.
> >
> > -- Bert
> >
> > On Tue, Jun 18, 2013 at 7:41 AM, Andras Farkas <motyocska at yahoo.com>
> > wrote:
> > > Bert,
> > >
> > > thanks... The values should not repeat themselves if
> > the same a is closest to all b, so probably aruns example
> > extended with a unique command works best?
> > >
> > > unique(a[sapply(b,function(x) which.min(abs(x-a)))])
> > >
> > > thanks,
> > >
> > > Andras
> > >
> > > --- On Tue, 6/18/13, Bert Gunter <gunter.berton at gene.com>
> > wrote:
> > >
> > >> From: Bert Gunter <gunter.berton at gene.com>
> > >> Subject: Re: [R] find closest value in a vector
> > based on another vector values
> > >> To: "Jorge I Velez" <jorgeivanvelez at gmail.com>
> > >> Cc: "Andras Farkas" <motyocska at yahoo.com>,
> > "R mailing list" <r-help at r-project.org>
> > >> Date: Tuesday, June 18, 2013, 10:07 AM
> > >> Jorge: No.
> > >>
> > >> > a <-c(1,5,8,15,32,33.5,69)
> > >> > b <-c(8.5,33)
> > >> > a[findInterval(b, a)]
> > >> [1]  8 32  ##should be
> > >> 8   33.5
> > >>
> > >> I believe it has to be done explicitly by finding
> > all the
> > >> differences
> > >> and choosing those n with minimum values, depending
> > on what
> > >> n you
> > >> want.
> > >>
> > >> Note that the problem is incompletely specified.
> > What if the
> > >> same
> > >> value of a is closest to several values of b? -- do
> > you want
> > >> all the
> > >> values you choose to be different or not, in which
> > case they
> > >> may not
> > >> be minimum?
> > >>
> > >> a <- c(1, 8, 9)
> > >> b <- c(2,3)
> > >>
> > >> Then what are the 2 closest values of a to b?
> > >>
> > >> -- Bert
> > >>
> > >> On Tue, Jun 18, 2013 at 5:43 AM, Jorge I Velez
> > <jorgeivanvelez at gmail.com>
> > >> wrote:
> > >> > Dear Andras,
> > >> >
> > >> > Try
> > >> >
> > >> >> a[findInterval(b, a)]
> > >> > [1]  8 32
> > >> >
> > >> > HTH,
> > >> > Jorge.-
> > >> >
> > >> >
> > >> > On Tue, Jun 18, 2013 at 10:34 PM, Andras
> > Farkas <motyocska at yahoo.com>
> > >> wrote:
> > >> >
> > >> >> Dear All,
> > >> >>
> > >> >> would you please provide your thoughts on
> > the
> > >> following:
> > >> >> let us say I have:
> > >> >>
> > >> >> a <-c(1,5,8,15,32,69)
> > >> >> b <-c(8.5,33)
> > >> >>
> > >> >> and I would like to extract from "a" the
> > two values
> > >> that are closest to
> > >> >> the values in "b", where the length of
> > this vectors
> > >> may change but b will
> > >> >> allways be shorter than "a". So at the end
> > based on
> > >> this example I should
> > >> >> have the result "f" as
> > >> >>
> > >> >> f <-c(8,32)
> > >> >>
> > >> >> appreciate the help,
> > >> >>
> > >> >> Andras
> > >> >>
> > >> >>
> > ______________________________________________
> > >> >> R-help at r-project.org
> > >> mailing list
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html
> > >> >> and provide commented, minimal,
> > self-contained,
> > >> reproducible code.
> > >> >>
> > >> >
> > >> >
> >    [[alternative
> > >> HTML version deleted]]
> > >> >
> > >> >
> > ______________________________________________
> > >> > R-help at r-project.org
> > >> mailing list
> > >> > https://stat.ethz.ch/mailman/listinfo/r-help
> > >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > >> > and provide commented, minimal,
> > self-contained,
> > >> reproducible code.
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Bert Gunter
> > >> Genentech Nonclinical Biostatistics
> > >>
> > >> Internal Contact Info:
> > >> Phone: 467-7374
> > >> Website:
> > >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
> > >>
> >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > Internal Contact Info:
> > Phone: 467-7374
> > Website:
> > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list