[R] Basic question on concatenating factors

Stavros Macrakis macrakis at alum.mit.edu
Sun Nov 23 05:43:43 CET 2008


On Sat, Nov 22, 2008 at 10:20 AM, jim holtman <jholtman at gmail.com> wrote:
>  c.Factor <-
> function (x, y)
> {
>    newlevels = union(levels(x), levels(y))
>    m = match(levels(y), newlevels)
>    ans = c(unclass(x), m[unclass(y)])
>    levels(ans) = newlevels
>    class(ans) = "factor"
>    ans
> }

This algorithm depends crucially on union preserving the order of the
elements of its arguments. As far as I can tell, the spec of union
does not require this.  If union were to (for example) sort its
arguments then merge them (generally a more efficient algorithm), this
function would no longer work.

Fortunately, the fix is simple.  Instead of union, use:

     newlevels <- c(levels(x),setdiff(levels(y),levels(x))

which is guaranteed to preserve the order of levels(x).

             -s



More information about the R-help mailing list