[Rd] identical(0, -0)

Wed Aug 12 18:20:13 CEST 2009

On Wed, Aug 12, 2009 at 04:02:28PM +0200, Martin Maechler wrote:
> >>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
> >>>>>     on Wed, 12 Aug 2009 13:50:46 +0200 writes:
> 
>     PS> Let me add the following to the discussion of identical(0, -0).
>     PS> I would like to suggest to replace the paragraph
> 
>     PS> 'identical' sees 'NaN' as different from 'NA_real_', but all
>     PS> 'NaN's are equal (and all 'NA' of the same type are equal).
> 
>     PS> in ?identical by the following text, which is a correction of my previous
>     PS> suggestion for the same paragraph
> 
>     > Components of numerical objects are compared as follows. For non-missing
>     > values, "==" is used. In particular, '0' and '-0' are considered equal. 
>     > All 'NA's of the same type are equal and all 'NaN's are equal, although 
>     > their bit patterns may differ in some cases. 'NA' and 'NaN' are always 
>     > different. 
>     > Note also that 1/0 and 1/(-0) are different.
> 
> the 'numerical' would have to be qualified ('double', 'complex'
> via double), as indeed,  memcmp() is used on integers
> 
> The last sentence is not necessary and probably even confusing:
> Of course, -Inf and Inf are different.

I agree.

>     PS> The suggestion for the default of identical(0, -0) is TRUE, because the
>     PS> negative zero is much less important than NA na NaN and, possibly,
>     PS> distinguishing 0 and -0 could even be deprecated.
> 
> What should that mean??  R *is* using the international floating
> point standards, and 0 and -0 exist there and they *are*
> different!

I am sorry for being too short. In my opinion, distinguishing 0 and -0 is
not useful enough to make the default behavior of identical() different
from the behavior of == in this case.

> If  R  would start --- with a performance penalty, btw ! ---
> to explicitly map all internal '-0' into '+0'  we would
> explicitly move away from the international FP standards... 
> no way!

Yes, i agree. I did not meant this.

>     PS> Moreover, the argument
>     PS> of efficiency of memcmp cannot be used here, since there are different
>     PS> variants of NaN and NA, which should not be distinguished by default.
> 
> your argument is only partly true... as memcmp() can still be
> used instead of '=='  *after* the NA-treatments  {my current
> patch does so},

OK. In this case, memcmp() could still be faster than ==, although 
this is beyond my knowledge.

> and even more as I have been proposing an option "strict"  which
> would only use memcmp()  {and hence also distinguish different
> NA, NaN's}.

I understand the previous messages in this thread as that there is an
agreement that such an option would be very useful and would lead
to faster comparison.

Petr.