[Rd] 'droplevels' inappropriate change

Martin Maechler maechler at stat.math.ethz.ch
Mon Aug 22 12:30:28 CEST 2016

```>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>>     on Sun, 21 Aug 2016 10:44:18 +0000 writes:

> In R devel r71124, if 'x' is a factor, droplevels(x) gives
> factor(x, exclude = NULL) .  In R 3.3.1, it gives
> factor(x) .

> If a factor 'x' has NA and levels of 'x' doesn't contain
> NA, factor(x) gives the expected result for droplevels(x)
> , but factor(x, exclude = NULL) doesn't. As I said in
> https://stat.ethz.ch/pipermail/r-devel/2016-May/072796.html
> , factor(x, exclude = NULL) adds NA as a level.

> Using factor(x, exclude = if(anyNA(levels(x))) NULL else NA ) ,
> like in the code of function `[.factor` (in the
> same file, factor.R, as 'droplevels'), is better.  It is
> possible just to use x[, drop = TRUE] .

You are right.  The change to droplevels() [in svn rev 71113 ]
was not thorough enough, and I will commit a change that uses

factor(x, exclude = if(anyNA(levels(x))) NULL else NA )

------

> For a factor 'x' that has NA level and also NA value,

i.e., one like this ?

x <- factor(c(1, 2, NA, NA), exclude = NULL) ; is.na(x)[2] <- TRUE
x # << two "different" NA's (in codes | w/ level) looking the same in print()
stopifnot(identical(x, structure(as.integer(c(1, NA, 3, 3)),
.Label = c("1", "2", NA), class = "factor")))

> factor(x, exclude = NULL) is not perfect, though. It
> change NA to be associated with NA factor level.

yes, it does, but why is that not good?
The result of calling factor() on a factor 'f' should either be 'f'
*or* a more regular version of 'f'.

Now, for the above 'x' --- which I call "pathological", as it
has two kinds of NA's but the user does not easily see that ---
I am happy that both

factor(x)               # and
factor(x, exlude = NULL)

produce a "regularized" version of x:

> dput(x)
structure(c(1L, NA, 3L, 3L), .Label = c("1", "2", NA), class = "factor")
> dput(factor(x))
structure(c(1L, NA, NA, NA), .Label = "1", class = "factor")
> dput(factor(x, exclude=NULL))
structure(c(1L, 2L, 2L, 2L), .Label = c("1", NA), class = "factor")
>

```