[Rd] duplicated factor labels.

Thu Jun 15 17:15:17 CEST 2017

>>>>> Paul Johnson <pauljohn32 at gmail.com>
>>>>>     on Wed, 14 Jun 2017 19:00:11 -0500 writes:

    > Dear R devel
    > I've been wondering about this for a while. I am sorry to ask for your
    > time, but can one of you help me understand this?

    > This concerns duplicated labels, not levels, in the factor function.

    > I think it is hard to understand that factor() fails, but levels()
    > after does not

    >> x <- 1:6
    >> xlevels <- 1:6
    >> xlabels <- c(1, NA, NA, 4, 4, 4)
    >> y <- factor(x, levels = xlevels, labels = xlabels)
    > Error in `levels<-`(`*tmp*`, value = if (nl == nL)
    > as.character(labels) else paste0(labels,  :
    > factor level [3] is duplicated
    >> y <- factor(x, levels = xlevels)
    >> levels(y) <- xlabels
    >> y
    > [1] 1    <NA> <NA> 4    4    4
    > Levels: 1 4

    > If the latter use of levels() causes a good, expected result, couldn't
    > factor(..., labels = xlabels) be made to the same thing?

I may misunderstand, but I think you are confusing 'labels' and 'levels'
here, (and you are not alone in this!) mostly because  R's
factor() function treats them as arguments in a way that can be
confusing.. (but I don't think we'd want to change that; it's
been documented and in use for  > 25 year (in S, S+, R).

Note that after the above,

> dput(y)
structure(c(1L, NA, NA, 2L, 2L, 2L), .Label = c("1", "4"), class = "factor")

and that of course _is_ a valid factor .. which you can easily
get directly via e.g.

> identical(y, factor(c(1,NA,NA,4,4,4)))
[1] TRUE

or also  via

> identical(y, factor(c("1",NA,NA,"4","4","4")))
[1] TRUE

I really don't see a need for a change of factor().
It should remain as simple as possible (but not simpler :-).

Martin