[Rd] identical(0, -0)

Mon Aug 10 16:20:39 CEST 2009

On 8/10/2009 9:55 AM, Simon Urbanek wrote:
> On Aug 10, 2009, at 5:47 , Duncan Murdoch wrote:
> 
>> Petr Savicky wrote:
>>> On Sat, Aug 08, 2009 at 10:39:04AM -0400, Prof. John C Nash wrote:
>>>
>>>> I'll save space and not include previous messages.
>>>>
>>>> My 2 cents: At the very least the documentation needs a fix. If it  
>>>> is easy to do, then Ted Harding's suggestion of a switch (default  
>>>> OFF) to check for sign difference would be sensible.
>>>>
>>>> I would urge inclusion in the documentation of the +0, -0  
>>>> example(s) if there is NOT a way in R to distinguish these.
>>>>
>>>
>>> It is possible to distinguish 0 and -0 in R, since 1/0 == Inf and
>>> 1/(-0) == -Inf.
>>>
>>> I do not know, whether there are also other such situations. In  
>>> particular
>>>  (0)^(-1) == (-0)^(-1) # [1] TRUE
>>>  log(0) == log(-0) # [1] TRUE
>>>
>>>
>>>> There are occasions where it is useful to be able to detect things  
>>>> like this (and NaN and Inf and -Inf etc.). They are usually not of  
>>>> interest to users, but sometimes are needed for developers to  
>>>> check edge effects. For those cases it may be time to consider a  
>>>> package FPIEEE754 or some similar name to allow testing and  
>>>> possibly setting of flags for some of the fancier features. Likely  
>>>> used by just a few of us in extreme situations.
>>>>
>>>
>>> I think that distinguishing 0 and -0 may be useful even for nonexpert
>>> users for debugging purposes. Mainly, because x == y does not imply
>>> that x and y behave equally as demonstrated above or by
>>>  x <- 0
>>>  y <-  - 0
>>>  x == y # [1] TRUE
>>>  1/x == 1/y # [1] FALSE
>>>
>>> I would like to recall the suggestion
>>>  On Sat, Aug 08, 2009 at 03:04:07PM +0200, Martin Maechler wrote:
>>>  > Maybe we should introduce a function that's basically
>>>  > isTRUE(all.equal(..., tol=0))  {but faster},  or
>>>  > do you want a 3rd argument to identical, say 'method'
>>>  > with default  c("oneNaN", "use.==", "strict")
>>>  >   > oneNaN: my proposal of using  memcmp() on doubles as its  
>>> used for
>>>  >        other types already  (and hence distinguishing +0 and -0;
>>>  >      otherwise keeping the feature that there's just one NaN
>>>  >      which differs from 'NA' (and there's just one 'NA').
>>>  >   > use.==: the previous R behaviour, using '==' on doubles    
>>> >   (and the "oneNaN" behavior)
>>>  >   > strict: be even stricter than oneNaN:  Use  memcmp()
>>>  >   unconditionally for doubles.  This would be the fastest
>>>  >   version of all three.
>>>
>>> In my opinion, for debugging purposes, the option  
>>> identical(x,y,method="strict"),
>>> which implies that x and y behave equally, could be useful, if it  
>>> is available
>>> in R base,
>>> At the R interactive level, negative zero as the value of -0 could  
>>> possibly
>>> be avoided. However, negative zero may also occur in numerical  
>>> calculations,
>>> since it may be obtained as x * 0, where x is negative. So, i  
>>> think, negative
>>> zero cannot be eliminated from consideration as something too  
>>> infrequent.
>>
>> I wouldn't mind a "strict" option.   It would compare bit patterns,  
>> so would distinguish +0 from -0, and different NaN values. But  
>> having the value of  identical(x-y, -(y-x)) depend on whether x and  
>> y are equal or not would just lead to confusion.
> 
> ... but so do other things routinely such as floating point  
> arithmetics so I don't think this is a strong argument here. IMHO  
> identical(0, -0) should return FALSE, because they are simply not the  
> same objects and that's what identical is supposed test for. If you  
> want to test equality of elements there are other means you should be  
> using that were mentioned in this thread.

+0 and -0 are exactly equal, which is what identical is documented to be 
testing.  They are not indistinguishable, and not identical in the 
English meaning of the word, but they are identical in the sense of what 
the identical() function is documented to test.

The cases where you want to distinguish between them are rare.  They 
should not be distinguished in the default identical() test, any more 
than different values of NaN should be distinguished (and identical() is 
explicitly documented *not* to distinguish those).

Of the 1600 uses of identical() in the R base plus recommended packages, 
there are lots of cases where equality of elements is clearly the 
intention.  There are almost no uses of the all.equal(..., tol=0) idiom 
in base R, and among the recommended packages, only Matrix uses it (but 
uses identical() for values as well, I think.)

Distinguishing between different NaN values might be harmless, because 
we probably only generate one.  (I'm not sure about that, the literal 
NaN might be different from sqrt(-1) or 0/0.  But I'd guess only one 
comes up in normal usage.)  But we definitely generate both +0 and -0 
all the time, and distinguishing between them would mean identical() 
would be useless for value-based comparison.  Do you want to evaluate 
all 1600 uses in the base and recommended package, and who knows how 
many on CRAN, to figure out which ones should be changed to 
all.equal(..., tol=0)?  I don't.

Duncan Murdoch