[Rd] identical(0, -0)

Fri Aug 7 16:46:23 CEST 2009

>>>>> "TH" == Ted Harding <Ted.Harding at manchester.ac.uk>
>>>>>     on Fri, 07 Aug 2009 14:49:54 +0100 (BST) writes:

    TH> On 07-Aug-09 11:07:08, Duncan Murdoch wrote:
    >> Martin Maechler wrote:
    >>>>>>>> William Dunlap <wdunlap at tibco.com>
    >>>>>>>> on Thu, 6 Aug 2009 15:06:08 -0700 writes:
    >>> >> -----Original Message----- From:
    >>> >> r-help-bounces at r-project.org
    >>> >> [mailto:r-help-bounces at r-project.org] On Behalf Of
    >>> >> Giovanni Petris Sent: Thursday, August 06, 2009 3:00 PM
    >>> >> To: milton.ruser at gmail.com Cc: r-help at r-project.org;
    >>> >> Daniel.Gerlanc at geodecapital.com Subject: Re: [R] Why is 0
    >>> >> not an integer?
    >>> >> 
    >>> >> 
    >>> >> I ran an instant experiment...
    >>> >> 
    >>> >> > typeof(0) [1] "double" > typeof(-0) [1] "double" >
    >>> >> identical(0, -0) [1] TRUE
    >>> >> 
    >>> >> Best, Giovanni
    >>> 
    >>> > But 0.0 and -0.0 have different reciprocals
    >>> 
    >>> >> 1.0/0.0
    >>> >    [1] Inf
    >>> >> 1.0/-0.0
    >>> >    [1] -Inf
    >>> 
    >>> > Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap
    >>> > tibco.com
    >>> 
    >>> yes.  {finally something interesting in this boring thread !}
    ---> diverting to R-devel
    >>> 
    >>> In April, I've had a private e-mail communication with John
    >>> Chambers [father of S, notably S4, which also brought identical()]
    >>> and Bill about the topic,
    >>> where I had started suggesting that  R  should be changed such
    >>> that
    >>> identical(-0. , +0.)
    >>> would return FALSE.
    >>> Bill did mention that it does so for (newish versions of) S+
    >>> and that he'd prefer that, too,
    >>> and John said
    >>> 
    >>> >> I agree on having a preference for a bitwise comparison for
    >>> >> identical()---that's what the name means after all.  But since
    >>> >> someone implemented the numerical case as the C == it's probably
    >>> >> going to be more hassle than it's worth to change it.  But we
    >>> >> should make the implementation clear in the documentation.
    >>> 
    >>> so in principle, we all agreed that R's identical() should be
    >>> changed here, namely by using something like  memcmp() instead
    >>> of simple '==' ,  however we haven't bothered to actually 
    >>> *implement* this change.
    >>> 
    >>> I am currently testing a patch  which would lead to
    >>> identical(0, -0)  return FALSE.
    >>> 
    >> I don't think that would be a good idea.  Other expressions besides
    >> "-0" 
    >> calculate the zero with the negative sign bit, e.g. the following
    >> sequence:
    >> 
    >> pos <- 1
    >> neg <- -1
    >> zero <- 0
    >> y <- zero*pos
    >> z <- zero*neg
    >> identical(y, z)
    >> 
    >> I think most R users would expect the last expression there to be
    >> TRUE based on the previous two lines, given that pos and neg both
    >> have finite values. In a simple case like this y == z would be a
    >> better test to use, but if those were components of a larger
    >> structure, identical() is all we've got, and people would waste a
    >> lot of time tracking down why structures differing only in the
    >> sign of zero were not identical, even though every element tested
    >> equal.

identical()  *is* not the same as '=='  even if you think of a
generalized '==',
and your example is not convincing to me.

Further note that help(identical)  has always said

 > Description:

 >    The safe and reliable way to test two objects for being _exactly_
 >    equal.  It returns 'TRUE' in this case, 'FALSE' in every other case.

which really should distinguish  -0 and +0

    >> Duncan Murdoch
    >>> Martin Maechler, ETH Zurich

    TH> My own view of this is that there may in certain cirumstances be an
    TH> interest in distinguishing between 0 and (-0), yet normally most
    TH> users will simply want to compare the numerical values.

    TH> Therefore I am in favour of revising identical() so that it can so
    TH> distinguish; but also of taking the opportunity to give it a parameter
    TH> say

    TH> identical(x,y,sign.bit=FALSE)

    TH> so that the default behaviour would be to see 0 and (-0) as identical,
    TH> but with sign.bit=TRUE it would see the difference.

    TH> However, I put this forward in ignorance of
    TH> a) Any difficulties that this may present in re-coding identical();
    TH> b) Any complications that may arise when applying this new form
    TH> to complex objects.

Your proposal would actually need to special case this one case,
rather than my patch  which  replaces  using  '=='   (in C) for
double by using  memcmp() instead,  something which is already
used for several other cases there, and hence seems more
consequent and in that way natural.

The one thing even the new code would not differentiate is the
different  NaN's (apart from NA) but they are not differentiable
on the R level either, AFAIK, at least AFAIU our language
specifications, we only want two things: NA and NaN

Martin