[Rd] type.convert and doubles
Robert.McGehee at geodecapital.com
Mon Apr 21 15:24:13 CEST 2014
Agreed. Perhaps even a global option would make sense. We already have an
option with a similar spirit: 'options(³stringsAsFactors"=T/F)'. Perhaps
'options(³exactNumericAsString²=T/F)' [or something else] would be
desirable, with the option being the default value to the type.convert
I also like Gabor¹s idea of a ³distinguishing class². R doesn¹t natively
support arbitrary precision numbers (AFAIK), but I think that¹s what
Murray wants. I could imagine some kind of new class emerging here that
initially looks just like a character/factor, but may evolve over time to
accept arithmetic methods and act more like a number (e.g. knowing that
³0.1², ³.10² and "1e-1" are the same number, or that ³-9²<³-0.2"). A class
On 4/20/14, 3:24 AM, "Murray Stokely" <murray at stokely.org> wrote:
>Yes, I'm also strongly in favor of having an option for this. If
>there was an option in base R for controlling this we would just use
>that and get rid of the separate RProtoBuf.int64AsString option we use
>in the RProtoBuf package on CRAN to control whether 64-bit int types
>from C++ are returned to R as numerics or character vectors.
>I agree that reasonable people can disagree about the default, but I
>found my original bug report about this, so I will counter Robert's
>example with my favorite example of what was wrong with the previous
>write.csv(tmp, "/tmp/foo.csv", quote=FALSE, row.names=FALSE)
>data <- read.csv("/tmp/foo.csv")
> - Murray
>On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek
><simon.urbanek at r-project.org> wrote:
>> On Apr 19, 2014, at 9:00 AM, Martin Maechler
>><maechler at stat.math.ethz.ch> wrote:
>>>>>>>> McGehee, Robert <Robert.McGehee at geodecapital.com>
>>>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes:
>>>>> This is all application specific and
>>>>> sort of beyond the scope of type.convert(), which now behaves as it
>>>>> has been documented to behave.
>>>> That's only a true statement because the documentation was changed to
>>>>reflect the new behavior! The new feature in type.convert certainly
>>>>does not behave according to the documentation as of R 3.0.3. Here's a
>>>> The first type that can accept all the
>>>> non-missing values is chosen (numeric and complex return values
>>>> will represented approximately, of course).
>>>> The key phrase is in parentheses, which reminds the user to expect a
>>>>possible loss of precision. That important parenthetical was removed
>>>>from the documentation in R 3.1.0 (among other changes).
>>>> Putting aside the fact that this introduces a large amount of
>>>>unnecessary work rewriting SQL / data import code, SQL packages, my
>>>>biggest conceptual problem is that I can no longer rely on a
>>>>particular function call returning a particular class. In my example
>>>>querying stock prices, about 5% of prices came back as factors and the
>>>>remaining 95% as numeric, so we had random errors popping in
>>>>throughout the morning.
>>>> Here's a short example showing us how the new behavior can be
>>>>unreliable. I pass a character representation of a uniformly
>>>>distributed random variable to type.convert. 90% of the time it is
>>>>converted to "numeric" and 10% it is a "factor" (in R 3.1.0). In the
>>>>10% of cases in which type.convert converts to a factor the leading
>>>>non-zero digit is always a 9. So if you were expecting a numeric
>>>>value, then 1 in 10 times you may have a bug in your code that didn't
>>>>> cl <- NULL; for (i in 1:10000) cl[i] <-
>>>> factor numeric
>>>> 990 9010
>>> Murray's point is valid, too.
>>> But in my view, with the reasoning we have seen here,
>>> *and* with the well known software design principle of
>>> "least surprise" in mind,
>>> I also do think that the default for type.convert() should be what
>>> it has been for > 10 years now.
>> I think there should be two separate discussions:
>> a) have an option (argument to type.convert and possibly read.table) to
>>enable/disable this behavior. I'm strongly in favor of this.
>> b) decide what the default for a) will be. I have no strong opinion, I
>>can see arguments in both directions
>> But most importantly I think a) is better than the status quo - even if
>>the discussion about b) drags out.
More information about the R-devel