[Rd] suggestion for extending ?as.factor

Petr Savicky savicky at cs.cas.cz
Fri May 8 17:14:48 CEST 2009


On Fri, May 08, 2009 at 03:18:01PM +0200, Martin Maechler wrote:
> As long as we don't want to allow  factor(<numeric>) to fail --rarely -- 
> I think (and that actually has been a recurring daunting thought
> for quite a few days) that we probably need an
> extra step of checking for duplicate levels, and if we find
> some, recode "everything". This will blow up the body of the
> factor() function even more.
> 
> What alternatives do you (all R-devel readers!) see?

The command 
  f <- match(x, levels)
in factor() uses the original values and not their string representations.
I think that the main reason to do so is that we loose the ordering, if the
conversion to character is done before levels are sorted.

Let me suggest to consider the following modification, where match() is done
on the strings, not on the original values.
  levels <- unique(as.character(sort(unique(x))))
  x <- as.character(x)
  f <- match(x, levels)

Since unique() preserves the order, we will get the levels correctly
ordered. Due to using unique() twice, we will not have duplicated levels.

Is it correct?

Petr.



More information about the R-devel mailing list