[R] Best practice: to factor or not to factor for float variables

MacQueen, Don macqueen1 at llnl.gov
Sat Jul 5 21:22:45 CEST 2014


However,

> format((0.1+0.2)) == format(0.3)
[1] TRUE

Which suggests that if you want to treat measured variables as categories,
one way to do it is to format them first.

Of course, one may have to control the format more carefully than above
(if necessary, see for example ?formatC).

merge() on carefully formatted float values may be more reliable than
merging on the floats themselves.

-Don


-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 7/4/14 4:04 AM, "Sebastian Schubert" <schubert.seb at gmail.com> wrote:

>Hi,
>
>I would like to ask for best practice advice on the design of data
>structure and the connected analysis techniques.
>
>In my particular case, I have measurements of several variables at
>several, sometimes equal, heights. Following the tidy data approach of
>Hadley Wickham, I want to put all data in one data frame. In principle,
>the height variable is something like a category. For example, I want to
>average over time for every height. Using dplyr this works very well
>when my height variable is a factor. However, if it is not a factor the
>grouping sometimes will not work probably due to numerical issues:
>
>http://stackoverflow.com/questions/24555010/dplyr-and-group-by-factor-vs-n
>o-factor
>https://github.com/hadley/dplyr/issues/482
>
>Even if the behaviour described in the links above is a bug, on can
>easily create other numerical issues in R:
>> (0.1+0.2) == 0.3
>[1] FALSE
>
>Thus, it seems one should avoid grouping by float values and, in my
>case, use factors. However, from time to time, I need the numerical
>character of the heights: compare heights, find the maximum height, etc.
>Here, the ordered factor approach might help. However, I have to combine
>(via rbind or merge) different data sets quite often so keeping the
>order of the different ordered factor heights also seem to be difficult.
>
>Is there any general approach which reduces the work or do I have to
>switch between approaches as needed?
>
>Thanks a lot for any input,
>Sebastian
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list