[Rd] duplicated factor labels.

Joris Meys jorismeys at gmail.com
Fri Jun 16 18:24:34 CEST 2017

Hi Paul,

Now I see what you're getting at. I misread your original mail completely.
So we definitely agree, and wholeheartedly even.

The use case you just gave, is definitely in my top 5 of frustrations about
R. I would like to be able to assign the same label to multiple levels
without having to use eg dplyr::recode_factor() or some other vectorized
switch statement to recode all data first.

I understand "it's been like that 25 years", but I've looked hard to find a
use case where adding this behaviour would invalid existing code and
couldn't come up with something.

So I add my (totally insignificant) vote for adding the possibility of
assigning the same label to multiple levels in factor() itself.

Cheers and thank you for bringing this up!

On Fri, Jun 16, 2017 at 6:02 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:

> On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys <jorismeys at gmail.com> wrote:
> > To extwnd on Martin 's explanation :
> >
> > In factor(), levels are the unique input values and labels the unique
> output
> > values. So the function levels() actually displays the labels.
> >
> Dear Joris
> I think we agree. Currently, factor insists both levels and labels be
> unique.
> I wish that it would not accept nonunique labels. I also understand it
> is impractical to change this now in base R.
> I don't think I succeeded in explaining why this would be nicer.
> Here's another example. Fairly often, we see input data like
> x <- c("Male", "Man", "male", "Man", "Female")
> The first four represent the same value.  I'd like to go in one step
> to a new factor variable with enumerated types "Male" and "Female".
> This fails
> xf <- factor(x, levels = c("Male", "Man", "male", "Female"),
>         labels = c("Male", "Male", "Male", "Female"))
> Instead, we need 2 steps.
> xf <- factor(x, levels = c("Male", "Man", "male", "Female"))
> levels(xf) <- c("Male", "Male", "Male", "Female")
> I think it is quirky that `levels<-.factor` allows the duplicated
> labels, whereas factor does not.
> I wrote a function rockchalk::combineLevels to simplify combining
> levels, but most of the students here like plyr::mapvalues to do it.
> The use of levels() can be tricky because one must enumerate all
> values, not just the ones being changed.
> But I do understand Martin's point. Its been this way 25 years, it
> won't change. :).
> > Cheers
> > Joris
> >
> >
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis
> http://crmda.ku.edu
> To write to me directly, please address me at pauljohn at ku.edu.

Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

More information about the R-devel mailing list