[Rd] Match .3 in a sequence

William Dunlap wdunlap at tibco.com
Wed Mar 18 19:49:59 CET 2009


> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
> Sent: Tuesday, March 17, 2009 12:15 PM
> To: Daniel Murphy
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Match .3 in a sequence
> 
> On 3/17/2009 11:26 AM, Daniel Murphy wrote:
> > Is this a reasonably fast way to do an approximate match of 
> a vector x to
> > values in a list?
> > 
> > match.approx  <- function(x,list,tol=.0001)
> >     sapply(apply(abs(outer(list,x,"-"))<tol,2,which),"[",1)
> 
> If you are willing to assume that the list values are all 
> multiples of 
> 2*tol, then it's easy:  just divide both x and list by 2*tol, 
> round to 
> nearest integer, and use the regular match function.
> 
> If not, it becomes harder; I'd probably use a solution like yours.
> 
> Duncan Murdoch

Here are 2 other implentations of that match.approx function
which use much less memory (and are faster) when the length
of 'x' and 'list' are long (>100, say).  The first uses 
approx(method="const") to figure out which entries in the
list are just below and above each entry in x and the second 
uses sorting tricks to do the same thing.  Then you only have
to figure out if the closest of those 2 entries is close enough.

The original one above fails when tol>min(diff(sort(list))).

match.approx2 <-
function(x,list,tol=.0001) {
    o1 <- rep.int(c(FALSE,TRUE),
c(length(x),length(list)))[order(c(x,list))]
    o2 <- rep.int(c(FALSE,TRUE),
c(length(x),length(list)))[order(c(x,list))]
   
    below <- approx(list, list, xout=x, method="constant", f=0)$y
    above <- approx(list, list, xout=x, method="constant", f=1)$y
    stopifnot(all(below<=x, na.rm=TRUE), all(above>=x, na.rm=TRUE))
    closestInList <- ifelse(x-below < above-x, below, above)
    closestInList[x<min(list)] <- min(list)
    closestInList[x>max(list)] <- max(list)
    closestInList[abs(x-closestInList)>tol] <- NA
    match(closestInList, list)
}
match.approx3 <-
function(x, list, tol=.0001){
    stopifnot(length(list)>0, !any(is.na(x)), !any(is.na(list)))
    oox <- order(order(x)) # essentially rank(x)
    i <- rep(c(FALSE,TRUE), c(length(x),length(list)))[order(c(x,
list))]
    i <- cumsum(i)[!i] + 1L
    i[i > length(list)] <- NA
    i <- order(list)[i]
    leastUpperBound <- i[oox]
    i <- rep(c(TRUE,FALSE), c(length(list),length(x)))[order(c(list,
x))]
    i <- cumsum(i)[!i]
    i[i < 1L] <- NA
    i <- order(list)[i]
    greatestLowerBound <- i[oox]
    closestInList <-
        ifelse(is.na(greatestLowerBound),
            leastUpperBound, # above max(list)
            ifelse(is.na(leastUpperBound),
                greatestLowerBound, # below min(list)
 
ifelse(x-list[greatestLowerBound]<list[leastUpperBound]-x,
                    greatestLowerBound,
                    leastUpperBound)))
    if (tol<Inf)
        closestInList[abs(x - list[closestInList])>tol] <- NA
    closestInList
}

> > 
> > Thanks.
> > -Dan
> > 
> > On Mon, Mar 16, 2009 at 8:24 AM, Stavros Macrakis 
> <macrakis at alum.mit.edu>wrote:
> > 
> >> Well, first of all, seq(from=.2,to=.3) gives c(0.2), so I 
> assume you
> >> really mean something like seq(from=.2,to=.3,by=.1), which gives
> >> c(0.2, 0.3).
> >>
> >> %in% tests for exact equality, which is almost never a 
> good idea with
> >> floating-point numbers.
> >>
> >> You need to define what exactly you mean by "in" for floating-point
> >> numbers.  What sort of tolerance are you willing to allow?
> >>
> >> Some possibilities would be for example:
> >>
> >> approxin <- function(x,list,tol) any(abs(list-x)<tol)   # absolute
> >> tolerance
> >>
> >> rapproxin <- function(x,list,tol) (x==0 && 0 %in% list) ||
> >> any(abs((list-x)/x)<=tol,na.rm=TRUE)
> >>     # relative tolerance; only exact 0 will match 0
> >>
> >> Hope this helps,
> >>
> >>          -s
> >>
> >> On Mon, Mar 16, 2009 at 9:36 AM, Daniel Murphy 
> <chiefmurphy at gmail.com>
> >> wrote:
> >> > Hello:I am trying to match the value 0.3 in the sequence 
> seq(.2,.3). I
> >> get
> >> >> 0.3 %in% seq(from=.2,to=.3)
> >> > [1] FALSE
> >> > Yet
> >> >> 0.3 %in% c(.2,.3)
> >> > [1] TRUE
> >> > For arbitrary sequences, this "invisible .3" has been 
> problematic. What
> >> is
> >> > the best way to work around this?
> >>
> > 
> > 	[[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list