[Rd] type.convert and doubles

peter dalgaard pdalgd at gmail.com
Tue Apr 29 09:32:21 CEST 2014


On 28 Apr 2014, at 19:17 , Martin Maechler <maechler at stat.math.ethz.ch> wrote:
> 
[...snip...]

>>> I think there should be two separate discussions:
> 
>>> a) have an option (argument to type.convert and possibly
>>> read.table) to enable/disable this behavior. I'm strongly
>>> in favor of this.
> 
>> In my (not committed) version of R-devel, I now have
> 
>>> str(type.convert(format(1/3, digits=17), exact=TRUE))
>>  Factor w/ 1 level "0.33333333333333331": 1
>>> str(type.convert(format(1/3, digits=17), exact=FALSE))
>>  num 0.333
> 
>> where the 'exact' argument name has been ``imported'' from
>> the underlying C code.
> 
>> [ As we CRAN package writers know by now, arguments
>> nowadays can hardly be abbreviated anymore, and so I am
>> not open to longer alternative argument names, as someone
>> liking blind typing, I'm not fond of camel case or other
>> keyboard gymnastics (;-) but if someone has a great idea
>> for a better argument name.... ]
> 
>> Instead of only TRUE/FALSE, we could consider NA with
>> semantics "FALSE + warning" or also "TRUE + warning".
> 
> 
>>> b) decide what the default for a) will be. I have no
>>> strong opinion, I can see arguments in both directions
> 
>> I think many have seen the good arguments in both
>> directions.  I'm still strongly advocating that we value
>> long term stability higher here, and revert to more
>> compatibility with the many years of previous versions.
> 
>> If we'd use a default of 'exact=NA', I'd like it to mean
>> FALSE + warning, but would not oppose much to TRUE +
>> warning.
> 
> I have now committed svn rev 65507  --- to R-devel only for now ---
> the above:   exact = NA  is the default
> and it means  "warning + FALSE".
> 
> Interestingly, I currently get 5 identical warnings for one
> simple call, so there seems clearly room for optimization, and
> that is one main reason for this reason to not yet be migrated
> to 'R 3.1.0 patched'.

I actually think that the default should be the old behaviour. No warning, just potentially lose digits. If this gets a user in trouble, _then_ turn on the check for lost digits. 

After all, I think we had about one single use case, where lost digits caused trouble (I cannot even dig up what the case was - someone had, like, 20-digit ID labels, I reckon). In contrast, we have seen umpteen cases where people have exported floating point data to slightly beyond machine precision, "just in case", and relied on read.table() to do the sensible thing.

It's also an open question whether we really want to apply the same logic to doubles and integer inputs. The whole change went in as (r62327)

"force type.convert to read e.g. 64-bit integers as strings/factors"

I, for one, did not expect that "e.g." would include 0.12345678901234567. My eyes were on the upcoming 3.0.0 release at that point, so I might not have noticed it anyway, but apparently noone lifted an eyebrow. It seems that this was deliberately postponed for 3.1.0, but for more than a year, noone actually exercised the code. 

-pd

BTW, "exact" is a horrible name for an option, how about digitloss=c("allow", "warn", "forbid")?


> 
> Martin
> 
> 
>> I agree that for the TRUE case, it may make more sense to
>> return string-like object of a new (simple) class such as
>> "bignum" that was mentioned in this thread.
> 
>> OTOH, this functionality should make it into an R 3.1.1 in
>> the not so distant future, and thinking through
>> consequences and implementing the new class approach may
>> just take a tad too much time...
> 
>> Martin
> 
>>> But most importantly I think a) is better than the status
>>> quo - even if the discussion about b) drags out.
> 
>>> Cheers, Simon
> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list