# [Rd] duplicated factor labels.

Martin Maechler maechler at stat.math.ethz.ch
Thu Jun 15 17:15:17 CEST 2017

```>>>>> Paul Johnson <pauljohn32 at gmail.com>
>>>>>     on Wed, 14 Jun 2017 19:00:11 -0500 writes:

> Dear R devel
> time, but can one of you help me understand this?

> This concerns duplicated labels, not levels, in the factor function.

> I think it is hard to understand that factor() fails, but levels()
> after does not

>> x <- 1:6
>> xlevels <- 1:6
>> xlabels <- c(1, NA, NA, 4, 4, 4)
>> y <- factor(x, levels = xlevels, labels = xlabels)
> Error in `levels<-`(`*tmp*`, value = if (nl == nL)
> as.character(labels) else paste0(labels,  :
> factor level [3] is duplicated
>> y <- factor(x, levels = xlevels)
>> levels(y) <- xlabels
>> y
> [1] 1    <NA> <NA> 4    4    4
> Levels: 1 4

> If the latter use of levels() causes a good, expected result, couldn't
> factor(..., labels = xlabels) be made to the same thing?

I may misunderstand, but I think you are confusing 'labels' and 'levels'
here, (and you are not alone in this!) mostly because  R's
factor() function treats them as arguments in a way that can be
confusing.. (but I don't think we'd want to change that; it's
been documented and in use for  > 25 year (in S, S+, R).

Note that after the above,

> dput(y)
structure(c(1L, NA, NA, 2L, 2L, 2L), .Label = c("1", "4"), class = "factor")

and that of course _is_ a valid factor .. which you can easily
get directly via e.g.

> identical(y, factor(c(1,NA,NA,4,4,4)))
[1] TRUE

or also  via

> identical(y, factor(c("1",NA,NA,"4","4","4")))
[1] TRUE

I really don't see a need for a change of factor().
It should remain as simple as possible (but not simpler :-).

Martin

```