[Rd] type.convert and doubles

Martin Maechler maechler at stat.math.ethz.ch
Mon Apr 28 19:17:03 CEST 2014


>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Sat, 26 Apr 2014 22:59:17 +0200 writes:

>>>>> Simon Urbanek <simon.urbanek at r-project.org>
>>>>>     on Sat, 19 Apr 2014 13:06:15 -0400 writes:

    >> On Apr 19, 2014, at 9:00 AM, Martin Maechler
    >> <maechler at stat.math.ethz.ch> wrote:
    >>>>>>>> McGehee, Robert <Robert.McGehee at geodecapital.com>
    >>>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes:
    >>> 
    >>>>> This is all application specific and sort of beyond
    >>>>> the scope of type.convert(), which now behaves as it
    >>>>> has been documented to behave.
    >>> 
    >>>> That's only a true statement because the documentation
    >>>> was changed to reflect the new behavior! The new
    >>>> feature in type.convert certainly does not behave
    >>>> according to the documentation as of R 3.0.3. Here's a
    >>>> snippit:
    >>> 
    >>>> The first type that can accept all the non-missing
    >>>> values is chosen (numeric and complex return values
    >>>> will represented approximately, of course).
    >>> 
    >>>> The key phrase is in parentheses, which reminds the
    >>>> user to expect a possible loss of precision. That
    >>>> important parenthetical was removed from the
    >>>> documentation in R 3.1.0 (among other changes).
    >>> 
    >>>> Putting aside the fact that this introduces a large
    >>>> amount of unnecessary work rewriting SQL / data import
    >>>> code, SQL packages, my biggest conceptual problem is
    >>>> that I can no longer rely on a particular function call
    >>>> returning a particular class. In my example querying
    >>>> stock prices, about 5% of prices came back as factors
    >>>> and the remaining 95% as numeric, so we had random
    >>>> errors popping in throughout the morning.
    >>> 
    >>>> Here's a short example showing us how the new behavior
    >>>> can be unreliable. I pass a character representation of
    >>>> a uniformly distributed random variable to
    >>>> type.convert. 90% of the time it is converted to
    >>>> "numeric" and 10% it is a "factor" (in R 3.1.0). In the
    >>>> 10% of cases in which type.convert converts to a factor
    >>>> the leading non-zero digit is always a 9. So if you
    >>>> were expecting a numeric value, then 1 in 10 times you
    >>>> may have a bug in your code that didn't exist before.
    >>> 
    >>>>> options(digits=16) cl <- NULL; for (i in 1:10000)
    >>>>> cl[i] <- class(type.convert(format(runif(1))))
    >>>>> table(cl)
    >>>> cl factor numeric 990 9010
    >>> 
    >>> Yes.
    >>> 
    >>> Murray's point is valid, too.
    >>> 
    >>> But in my view, with the reasoning we have seen here,
    >>> *and* with the well known software design principle of
    >>> "least surprise" in mind, I also do think that the
    >>> default for type.convert() should be what it has been
    >>> for > 10 years now.
    >>> 

    >> I think there should be two separate discussions:

    >> a) have an option (argument to type.convert and possibly
    >> read.table) to enable/disable this behavior. I'm strongly
    >> in favor of this.

    > In my (not committed) version of R-devel, I now have

    >> str(type.convert(format(1/3, digits=17), exact=TRUE))
    >   Factor w/ 1 level "0.33333333333333331": 1
    >> str(type.convert(format(1/3, digits=17), exact=FALSE))
    >   num 0.333

    > where the 'exact' argument name has been ``imported'' from
    > the underlying C code.

    > [ As we CRAN package writers know by now, arguments
    > nowadays can hardly be abbreviated anymore, and so I am
    > not open to longer alternative argument names, as someone
    > liking blind typing, I'm not fond of camel case or other
    > keyboard gymnastics (;-) but if someone has a great idea
    > for a better argument name.... ]

    > Instead of only TRUE/FALSE, we could consider NA with
    > semantics "FALSE + warning" or also "TRUE + warning".


    >> b) decide what the default for a) will be. I have no
    >> strong opinion, I can see arguments in both directions

    > I think many have seen the good arguments in both
    > directions.  I'm still strongly advocating that we value
    > long term stability higher here, and revert to more
    > compatibility with the many years of previous versions.

    > If we'd use a default of 'exact=NA', I'd like it to mean
    > FALSE + warning, but would not oppose much to TRUE +
    > warning.

I have now committed svn rev 65507  --- to R-devel only for now ---
the above:   exact = NA  is the default
and it means  "warning + FALSE".

Interestingly, I currently get 5 identical warnings for one
simple call, so there seems clearly room for optimization, and
that is one main reason for this reason to not yet be migrated
to 'R 3.1.0 patched'.

Martin


    > I agree that for the TRUE case, it may make more sense to
    > return string-like object of a new (simple) class such as
    > "bignum" that was mentioned in this thread.

    > OTOH, this functionality should make it into an R 3.1.1 in
    > the not so distant future, and thinking through
    > consequences and implementing the new class approach may
    > just take a tad too much time...

    > Martin

    >> But most importantly I think a) is better than the status
    >> quo - even if the discussion about b) drags out.

    >> Cheers, Simon

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list