[Rd] 'droplevels' inappropriate change

Martin Maechler maechler at stat.math.ethz.ch
Wed Aug 31 09:51:55 CEST 2016


>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Sat, 27 Aug 2016 18:55:37 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>>     on Sat, 27 Aug 2016 03:17:32 +0000 writes:

    >> In R devel r71157, 'droplevels' documentation, in "Arguments" section, says this about argument 'exclude'.
    >> passed to factor(); factor levels which should be excluded from the result even if present.  Note that this was implicitly NA in R <= 3.3.1 which did drop NA levels even when present in x, contrary to the documentation.  The current default is compatible with x[ , drop=FALSE].

    >> The part
    >> x[ , drop=FALSE]
    >> should be
    >> x[ , drop=TRUE]

    > Yes, definitely, thank you!
    > a "typo" by me. .. fixed now.

    >> Saying that 'exclude' is factor levels is not quite true for NA element. NA may be not an original level, but NA in 'exclude' affects the result.

    >> For a factor 'x', factor(x, exclude = exclude) doesn't really work for excluding in general. See, for example, https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html .
    >> factor(factor(c("a","b","c")), exclude="c")

    >> However, this excludes "2":
    >> factor(factor(2:3), exclude=2)

    >> Rather unexpectedly, this excludes NA:
    >> factor(factor(c("a",NA), exclude=NULL), exclude="c")

    >> For a factor 'x', factor(x, exclude = exclude) can only exclude integer-like or NA levels. An explanation is in https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html .

    > Well, Peter Dalgaard (in that R-devel e-mail, a bit more than 5
    > years ago) is confirming the problem there,  and suggesting (as
    > you, right?) that actually   `factor()` is not behaving
    > correctly here.

    > And your persistence is finally getting close to convince me
    > that it is not just droplevels(), but  factor() itself which
    > needs care here.

    > Interestingly, the following patch *does* pass 'make check-all'
    > (after small change in tests/reg-tests-1b.R which is ok),
    > and leads to behavior which is much closer to the documentation,
    > notably for your two examples above would give what one would
    > expect.

    > (( If the R-Hub would support experiments with branches of R-devel 
    > from R-core members,  I could just create such a branch and R Hub
    > would run 'R CMD check <pkg>'  for thousands of CRAN packages
    > and provide a web page with the *differences* in the package
    > check results ... so we could see ... ))

    > I do agree that we should strongly consider such a change.

as nobody has commented, I've been liberal and have taken these
no comments as consent.

Hence I have committed

------------------------------------------------------------------------
r71178 | maechler | 2016-08-31 09:45:40 +0200 (Wed, 31 Aug 2016) | 1 line
Changed paths:
   M /trunk/doc/NEWS.Rd
   M /trunk/src/library/base/R/factor.R
   M /trunk/src/library/base/man/factor.Rd
   M /trunk/tests/reg-tests-1b.R
   M /trunk/tests/reg-tests-1c.R

factor(x, exclude) more "rational" when x or exclude are character
------------------------------------------------------------------------

which apart from documentation, examples, and regression tests
is just the patch below.

Martin Maechler
ETH Zurich


    > --- factor.R	(revision 71157)
    > +++ factor.R	(working copy)
    > @@ -28,8 +28,12 @@
    > levels <- unique(y[ind])
    > }
    > force(ordered) # check if original x is an ordered factor
    > -    exclude <- as.vector(exclude, typeof(x)) # may result in NA
    > -    x <- as.character(x)
    > +    if(!is.character(x)) {
    > +	if(!is.character(exclude))
    > +	    exclude <- as.vector(exclude, typeof(x)) # may result in NA
    > +	x <- as.character(x)
    > +    } else
    > +	exclude <- as.vector(exclude, typeof(x)) # may result in NA
    > ## levels could be a long vectors, but match will not handle that.
    > levels <- levels[is.na(match(levels, exclude))]
    > f <- match(x, levels)



More information about the R-devel mailing list