[Rd] suggestion for extending ?as.factor

Martin Maechler maechler at stat.math.ethz.ch
Fri May 8 18:48:40 CEST 2009

>>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
>>>>>     on Fri, 8 May 2009 18:10:56 +0200 writes:

    PS> On Fri, May 08, 2009 at 05:14:48PM +0200, Petr Savicky wrote:
    >> Let me suggest to consider the following modification, where match() is done
    >> on the strings, not on the original values.
    >> levels <- unique(as.character(sort(unique(x))))
    >> x <- as.character(x)
    >> f <- match(x, levels)

    PS> An alternative solution is

    > ind <- order(x)
    > x <- as.character(x) # or any other conversion to character
    > levels <- unique(x[ind]) # get unique levels ordered by the original values
    > f <- match(x, levels)

Yes, that's an interesting quite different and simple approach.

    PS> The advantage of this over the suggestion from my previous email is that
    PS> the string conversion is applied only once. The conversion need not be only
    PS> as.character(). There may be other choices specified by a parametr. I have
    PS> strong objections against the existing implementation of as.character(),

{(because it is not *accurate* enough, right ?)}

    PS> but still i think that as.character() should be the default for factor()
    PS> for the sake of consistency of the R language.

Hmm...  Peter Dalgaard very early in this thread
remarked that at least in the use of  table(..),
factor() should not be extremely accurate, and that's what
R-devel's factor has been doing recently.

But then, table(.) could be changed to explicitly call
    factor(signif(x, 15), ...)
for the case of numeric x.

BTW: I found that practically all the remaining border cases you
     had, are "solved" by using  14 instead of 15.

I'm currently testing a version of factor() that uses 14, 
*and* adds an extra final level test, removing duplicated ones.


More information about the R-devel mailing list