[Rd] suggestion for extending ?as.factor

Martin Maechler maechler at stat.math.ethz.ch
Mon May 4 10:40:12 CEST 2009


>>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
>>>>>     on Sun, 3 May 2009 22:32:04 +0200 writes:
>>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
>>>>>     on Sun, 3 May 2009 22:32:04 +0200 writes:

    PS> In R-2.10.0, the development version, function as.factor() uses 17 digit
    PS> precision for conversion of numeric values to character type. This
    PS> is very good for the consistency of the resulting factor, however,
    PS> i expect that people will complain about, for example, as.factor(0.3)
    PS> being
    PS> [1] 0.29999999999999999
    PS> Levels: 0.29999999999999999

    PS> I suggest to extend the "Warning" section of ?as.factor by the following
    PS> paragraph.

    PS> If as.factor() is used for a numeric vector, then the numbers are
    PS> converted to character strings with 17 digit precision using their
    PS> machine representation. This guarantees that different numbers are
    PS> converted to different levels, but may produce unwanted results, if
    PS> the numbers are expected to have limited number of decimal positions.
    PS> For example, as.factor(c(0.1, 0.2, 0.3)) produces
    PS> [1] 0.10000000000000001 0.20000000000000001 0.29999999999999999
    PS> Levels: 0.10000000000000001 0.20000000000000001 0.29999999999999999
    PS> In order to avoid this, convert the numbers to a character vector
    PS> using formatC() or a similar function before using as.factor().

    PS> Petr.

Thank you, Petr, for the good suggestion.

I have added a (shorter) paragraph, though to the 'Details' not the
'Warning' section, and also one to the 'Examples' :

## Converting (non-integer) numbers:
as.factor(c(0.1, 0.2, 0.3)) # maybe not what you'd expect, so rather use
factor(format(c(0.1, 0.2, 0.3)))

Martin Maechler, ETH Zurich



More information about the R-devel mailing list