[Rd] Match .3 in a sequence

Stavros Macrakis macrakis at alum.mit.edu
Tue Mar 17 00:39:23 CET 2009


The factor approach is horrifically ugly and dangerous.

Even if it didn't have the extraordinarily poor behavior documented
below, it simply isn't well-defined what it should do.  The explicit
approximation route is far far preferable in every way: more
predictable, more controllable, and even (though it hardly matters
usually) faster.

Let's look at the extraordinarily poor behavior I was mentioning. Consider:

nums <- (.3 + 2e-16 * c(-2,-1,1,2)); nums
[1] 0.3 0.3 0.3 0.3

Though they all print as .3 with the default precision (which is
normal and expected), they are all different from .3:

nums - .3 =>  -3.885781e-16 -2.220446e-16  2.220446e-16  3.885781e-16

When we convert nums to a factor, we get:

fact <- as.factor(nums); fact
[1] 0.300000000000000 0.3               0.3               0.300000000000000
Levels: 0.300000000000000 0.3 0.3 0.300000000000000

Not clear what the difference between 0.300000000000000 and 0.3 is
supposed to be, nor why some 0.300000000000000 are < .3 and others are
> .3, but let's put that aside for the moment.

Now let's look at the relations among the factor values:

fact[1]==fact[2]
[1] FALSE
> fact[1]==fact[4]
[1] TRUE

So though nums[1] < nums[2] < nums[3] < nums[4], fact[1] compares
*unequal* to fact[2] though it compares *equal* to fact[4].
Apparently R is comparing the *names* of the levels rather than the
indexes in the factor.  This would be weird even if it didn't lead to
this very bad case.

Hope this helps,

             -s


On Mon, Mar 16, 2009 at 6:53 PM, Daniel Murphy <chiefmurphy at gmail.com> wrote:
> I have a matrix whose columns were filled with values which were functions
> of cvseq<-seq(.2,.3,by=.1) (and a row value of mode integer). To do a lookup
> for cv=.3 later, I wanted to match(.3,cvseq), which gave me NA, hence my
> question. I thought R would match .3 in cvseq within .Machine$double.eps,
> but I can understand it if .3 and the second element of cvseq would not have
> identical bits.
> Besides the helpful suggestions below, I also tried
>> cvseqf <- as.factor(cvseq)
>> match(.3,cvseq)
> [1] 2
> which worked.
> In general, would it be better to go the enumeration route via as.factor or
> the approximation route?
> Thanks for the help.
> -Dan
>
> On Mon, Mar 16, 2009 at 8:24 AM, Stavros Macrakis <macrakis at alum.mit.edu>
> wrote:
>>
>> Well, first of all, seq(from=.2,to=.3) gives c(0.2), so I assume you
>> really mean something like seq(from=.2,to=.3,by=.1), which gives
>> c(0.2, 0.3).
>>
>> %in% tests for exact equality, which is almost never a good idea with
>> floating-point numbers.
>>
>> You need to define what exactly you mean by "in" for floating-point
>> numbers.  What sort of tolerance are you willing to allow?
>>
>> Some possibilities would be for example:
>>
>> approxin <- function(x,list,tol) any(abs(list-x)<tol)   # absolute
>> tolerance
>>
>> rapproxin <- function(x,list,tol) (x==0 && 0 %in% list) ||
>> any(abs((list-x)/x)<=tol,na.rm=TRUE)
>>     # relative tolerance; only exact 0 will match 0
>>
>> Hope this helps,
>>
>>          -s
>>
>> On Mon, Mar 16, 2009 at 9:36 AM, Daniel Murphy <chiefmurphy at gmail.com>
>> wrote:
>> > Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I
>> > get
>> >> 0.3 %in% seq(from=.2,to=.3)
>> > [1] FALSE
>> > Yet
>> >> 0.3 %in% c(.2,.3)
>> > [1] TRUE
>> > For arbitrary sequences, this "invisible .3" has been problematic. What
>> > is
>> > the best way to work around this?
>
>



More information about the R-devel mailing list