[Rd] suggestion for extending ?as.factor

Martin Maechler maechler at stat.math.ethz.ch
Sat May 9 22:55:17 CEST 2009


>>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
>>>>>     on Fri, 8 May 2009 18:10:56 +0200 writes:

    PS> On Fri, May 08, 2009 at 05:14:48PM +0200, Petr Savicky wrote:
    >> Let me suggest to consider the following modification, where match() is done
    >> on the strings, not on the original values.
    >> levels <- unique(as.character(sort(unique(x))))
    >> x <- as.character(x)
    >> f <- match(x, levels)

    PS> An alternative solution is

    PS> ind <- order(x)
    PS> x <- as.character(x) # or any other conversion to character
    PS> levels <- unique(x[ind]) # get unique levels ordered by the original values
    PS> f <- match(x, levels)

(slightly but not much more complicated though).

Yes, indeed that brings us back to (something like) the original
"use  factor(format(x))  ..."  suggestion which would have been
fine if there hadn't been the issue of ordering,
exactly what you've addressed before.


    PS> The advantage of this over the suggestion from my previous email is that
    PS> the string conversion is applied only once. The conversion need not be only
    PS> as.character(). There may be other choices specified by a parametr. I have
    PS> strong objections against the existing implementation of as.character(),
    PS> but still i think that as.character() should be the default for factor()
    PS> for the sake of consistency of the R language.

The biggest advantage to reverting to something simple like
that, would be that it is really simple.

My first tests with (a variation of) the above indicate
favorable results.  More on this on Monday.
If'd revert to such a solution,
we'd have to get back to Peter's point about the issue that
he'd think  table(.) should be more tolerant than as.character()
about "almost equality".
For compatibility reasons, we could also return back to the
reasoning that useR should use {something like}
    table(signif(x, 14)) 
instead of
    table(x) 
for numeric x in "typical" cases.

Martin



More information about the R-devel mailing list