[Rd] RFC: hexadecimal constants and decimal points

Duncan Murdoch murdoch at stats.uwo.ca
Mon Apr 18 09:33:42 CEST 2005


> On Sun, 17 Apr 2005, Jan T. Kim wrote:
>
>> On Sun, Apr 17, 2005 at 12:38:10PM +0100, Prof Brian Ripley wrote:
>>> These are some points stimulated by reading about C history (and
>>> related in their implementation).
>>>
>>>
>>> 1) On some platforms
>>>
>>>> as.integer("0xA")
>>> [1] 10
>>>
>>> but not all (not on Solaris nor Windows).  We do not define what is
>>> allowed, and rely on the OS's implementation of strtod (yes, not
>>> strtol).
>>> It seems that glibc does allow hex: C99 mandates it but C89 seems not
>>> to
>>> allow it.
>>>
>>> I think that was a mistake, and strtol should have been used.  Then C89
>>> does mandate the handling of hex constants and also octal ones.  So
>>> changing to strtol would change the meaning of as.integer("011").
>>
>> I think interpretation of a leading "0" as a prefix indicating an octal
>> representation should indeed be avoided. People not familiar to C will
>> have a hard time understanding and getting used to this concept, and
>> in addition, it happens way too often that numeric data are provided
>> left-
>> padded with zeros.

I agree with this:  011 should be 11, it should not be 9.

>>> Proposal: we handle this ourselves and define what values are
>>> acceptable,
>>> namely for as.integer:
>>>
>>> [+|-][0-9]+
>>> NA
>>> 0[x|X][0-9A-fa-f]+
>>
>> It can be a somewhat mixed blessing if the string representation of
>> numeric
>> values contain information about their base, in the form of the 0x
>> prefix
>> in this case.
>>
>> The base argument (#3) of C's strtol function can be set to to a base
>> explicitly or to 0, which gives the prefix-based "auto-selection"
>> behaviour. On the R level, such a base argument (to as.integer) could be
>> included and a default could be set.
>
> A lot of this is internal, not at R level.
>
>> Personally, I would be equally happy with the default being 0
>> (auto-select)
>> or 10. Considering the perhaps limited spread of familiarity with C's
>> "0x" idiom, I somewhat favour a consistent and "stubborn" decimal
>> behaviour
>> (base defaults to 10), though.
>
> Some people already rely on it, and those who don't know about it are
> unliekly to ever enter what they think is an illegal value, surely?

As long as we document it, I think the 0x prefix is fine.

We should provide a way to use other bases on input and output.  This
could be through format specifiers, but it would be enough to have a pair
of dedicated functions to do the conversions.

Duncan Murdoch



More information about the R-devel mailing list