[Rd] identical(0, -0)

Petr Savicky savicky at cs.cas.cz
Wed Aug 12 18:20:13 CEST 2009


On Wed, Aug 12, 2009 at 04:02:28PM +0200, Martin Maechler wrote:
> >>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
> >>>>>     on Wed, 12 Aug 2009 13:50:46 +0200 writes:
> 
>     PS> Let me add the following to the discussion of identical(0, -0).
>     PS> I would like to suggest to replace the paragraph
> 
>     PS> 'identical' sees 'NaN' as different from 'NA_real_', but all
>     PS> 'NaN's are equal (and all 'NA' of the same type are equal).
> 
>     PS> in ?identical by the following text, which is a correction of my previous
>     PS> suggestion for the same paragraph
> 
>     > Components of numerical objects are compared as follows. For non-missing
>     > values, "==" is used. In particular, '0' and '-0' are considered equal. 
>     > All 'NA's of the same type are equal and all 'NaN's are equal, although 
>     > their bit patterns may differ in some cases. 'NA' and 'NaN' are always 
>     > different. 
>     > Note also that 1/0 and 1/(-0) are different.
> 
> the 'numerical' would have to be qualified ('double', 'complex'
> via double), as indeed,  memcmp() is used on integers
> 
> The last sentence is not necessary and probably even confusing:
> Of course, -Inf and Inf are different.

I agree.

>     PS> The suggestion for the default of identical(0, -0) is TRUE, because the
>     PS> negative zero is much less important than NA na NaN and, possibly,
>     PS> distinguishing 0 and -0 could even be deprecated.
> 
> What should that mean??  R *is* using the international floating
> point standards, and 0 and -0 exist there and they *are*
> different!

I am sorry for being too short. In my opinion, distinguishing 0 and -0 is
not useful enough to make the default behavior of identical() different
from the behavior of == in this case.

> If  R  would start --- with a performance penalty, btw ! ---
> to explicitly map all internal '-0' into '+0'  we would
> explicitly move away from the international FP standards... 
> no way!

Yes, i agree. I did not meant this.

>     PS> Moreover, the argument
>     PS> of efficiency of memcmp cannot be used here, since there are different
>     PS> variants of NaN and NA, which should not be distinguished by default.
> 
> your argument is only partly true... as memcmp() can still be
> used instead of '=='  *after* the NA-treatments  {my current
> patch does so},

OK. In this case, memcmp() could still be faster than ==, although 
this is beyond my knowledge.

> and even more as I have been proposing an option "strict"  which
> would only use memcmp()  {and hence also distinguish different
> NA, NaN's}.

I understand the previous messages in this thread as that there is an
agreement that such an option would be very useful and would lead
to faster comparison.

Petr.



More information about the R-devel mailing list