[Rd] duplicated factor labels.

Paul Johnson pauljohn32 at gmail.com
Fri Jun 16 18:02:34 CEST 2017


On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys <jorismeys at gmail.com> wrote:
> To extwnd on Martin 's explanation :
>
> In factor(), levels are the unique input values and labels the unique output
> values. So the function levels() actually displays the labels.
>

Dear Joris

I think we agree. Currently, factor insists both levels and labels be unique.

I wish that it would not accept nonunique labels. I also understand it
is impractical to change this now in base R.

I don't think I succeeded in explaining why this would be nicer.
Here's another example. Fairly often, we see input data like

x <- c("Male", "Man", "male", "Man", "Female")

The first four represent the same value.  I'd like to go in one step
to a new factor variable with enumerated types "Male" and "Female".
This fails

xf <- factor(x, levels = c("Male", "Man", "male", "Female"),
        labels = c("Male", "Male", "Male", "Female"))

Instead, we need 2 steps.

xf <- factor(x, levels = c("Male", "Man", "male", "Female"))
levels(xf) <- c("Male", "Male", "Male", "Female")

I think it is quirky that `levels<-.factor` allows the duplicated
labels, whereas factor does not.

I wrote a function rockchalk::combineLevels to simplify combining
levels, but most of the students here like plyr::mapvalues to do it.
The use of levels() can be tricky because one must enumerate all
values, not just the ones being changed.

But I do understand Martin's point. Its been this way 25 years, it
won't change. :).

> Cheers
> Joris
>
>


-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.



More information about the R-devel mailing list