[Rd] 'droplevels' inappropriate change

Mon Aug 22 12:30:28 CEST 2016

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>>     on Sun, 21 Aug 2016 10:44:18 +0000 writes:

    > In R devel r71124, if 'x' is a factor, droplevels(x) gives
    > factor(x, exclude = NULL) .  In R 3.3.1, it gives
    > factor(x) .

    > If a factor 'x' has NA and levels of 'x' doesn't contain
    > NA, factor(x) gives the expected result for droplevels(x)
    > , but factor(x, exclude = NULL) doesn't. As I said in
    > https://stat.ethz.ch/pipermail/r-devel/2016-May/072796.html
    > , factor(x, exclude = NULL) adds NA as a level.

    > Using factor(x, exclude = if(anyNA(levels(x))) NULL else NA ) , 
    > like in the code of function `[.factor` (in the
    > same file, factor.R, as 'droplevels'), is better.  It is
    > possible just to use x[, drop = TRUE] .

You are right.  The change to droplevels() [in svn rev 71113 ]
was not thorough enough, and I will commit a change that uses

    factor(x, exclude = if(anyNA(levels(x))) NULL else NA )

------

    > For a factor 'x' that has NA level and also NA value,

i.e., one like this ?

x <- factor(c(1, 2, NA, NA), exclude = NULL) ; is.na(x)[2] <- TRUE
x # << two "different" NA's (in codes | w/ level) looking the same in print()
stopifnot(identical(x, structure(as.integer(c(1, NA, 3, 3)),
				 .Label = c("1", "2", NA), class = "factor")))

    > factor(x, exclude = NULL) is not perfect, though. It
    > change NA to be associated with NA factor level.

yes, it does, but why is that not good?
The result of calling factor() on a factor 'f' should either be 'f'
*or* a more regular version of 'f'.

Now, for the above 'x' --- which I call "pathological", as it
has two kinds of NA's but the user does not easily see that ---
I am happy that both

  factor(x)               # and
  factor(x, exlude = NULL)

produce a "regularized" version of x:

  > dput(x)
  structure(c(1L, NA, 3L, 3L), .Label = c("1", "2", NA), class = "factor")
  > dput(factor(x))
  structure(c(1L, NA, NA, NA), .Label = "1", class = "factor")
  > dput(factor(x, exclude=NULL))
  structure(c(1L, 2L, 2L, 2L), .Label = c("1", NA), class = "factor")
  >