[Rd] ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler maechler at stat.math.ethz.ch
Tue Nov 22 11:12:51 CET 2016


>>>>> Gabriel Becker <gmbecker at ucdavis.edu>
>>>>>     on Tue, 15 Nov 2016 11:56:04 -0800 writes:

    > All,
    > Martin: Thanks for this and all the other things you are doing to both
    > drive R forward and engage more with the community about things like this.

    > Apologies for missing this discussion the first time it came around and if
    > anything here has already been brought up, but I wonder what exactly you
    > mean when you want recycling behavior.

Thank you, Gabe.

Note that my premise was really to get *away* from inheriting
too much from 'test'.
Hence, I have *not* been talking about replacing ifelse() but
rather of providing a new  ifelse2()

       [ or if_else()  if Hadley was willing to ditch the dplyr one
       	  	       in favor of a base one]

    > Specifically, based on an unrelated discussion with Henrik Bengtsson on
    > Twitter, I wonder if preserving the recycling behavior test is longer than
    > yes, no, but making the case where

    > length( test ) < max(length( yes ), length( no ))

    > would simplify usage for userRs in a useful way.

I'm sorry I don't understand the sentence above.

    > I suspect it's easy to
    > forget that the result is not guaranteed to be the length of  test, even
    > for intermediate and advanced users familiar with ifelse and it's
    > strengths/weaknesses.

    > I certainly agree (for what that's worth...) that

    > x = rnorm(100)

    > y = ifelse2(x > 0, 1L, 2L)

    > should continue to work.

(and give a a length 10 result).
Also
	ifelse2(x > 0, sqrt(x), 0L)

should work even though  class(sqrt(x)) is "numeric" and the one
of 0L is "integer", and I'd argue

	ifelse2(x < 0, sqrt(x + 0i), sqrt(x))

should also continue to work as with ifelse().

    > Also, If we combine a stricter contract that the output will always be of
    > length with the suggestion of a specified output class 

that was not my intent here.... but would be another interesting
extension. However, I would like to keep  R-semantic silent coercions
such as
	  logical < integer < double < complex

and your pseudo code below would not work so easily I think.

    > the pseudo code could be

(I'm changing assignment '=' to  '<-' ...  [please!] )

    > ifelse2 <- function(test, yes, no, outclass) {
    >   lenout  <- length(test)
    >   out <- as( rep(yes, length.out <- lenout), outclass)
    >   out[!test] <- as(rep(no, length.out = lenout)[!test], outclass)
    >   # handle NA stuff
    >   out
    > }


    > NAs could be tricky if outclass were allowed to be completely general, but
    > doable, I think? Another approach  if we ARE fast-passing while leaving
    > ifelse intact is that maybe NA's in test just aren't allowed in ifelse2.
    > I'm not saying we should definitely do that, but it's possible and would
    > make things faster.

    > Finally, In terms of efficiency, with the stuff that Luke and I are working
    > on, the NA detection could be virtually free in certain cases, which could
    > give a nice boost for long vectors  that don't have any NAs (and 'know'
    > that they don't).

That *is* indeed a very promising prospect!
Thank you in advance! 

    > Best,
    > ~G

I still am bit disappointed by the fact that it seems nobody has
taken a good look at my ifelse2() proposal.

I really would like an alternative to ifelse() in *addition* to
the current ifelse(), but hopefully in the future being used in
quite a few places instead of ifelse()
efficiency but for changed semantics, namely working for considerably
more "vector like" classes of  'yes' and 'no'  than the current
ifelse().

As I said, the current proposal works for objects of class
   "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
and hopefully for "sparseVector" (in a next version of the 'Matrix' pkg).

Martin

    > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <maechler at stat.math.ethz.ch
    >> wrote:

    >> Finally getting back to this :
    >> 
    >> >>>>> Hadley Wickham <h.wickham at gmail.com>
    >> >>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:
    >> 
    >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
    >> > <h.wickham at gmail.com> wrote:
    >> >>> >> One possibility would also be to consider a
    >> >>> "numbers-only" or >> rather "same type"-only {e.g.,
    >> >>> would also work for characters} >> version.
    >> >>>
    >> >>> > I don't know what you mean by these.
    >> >>>
    >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
    >> >>> which is very relevant, thank you Bob!
    >> >>>
    >> >>> As I have found, that actually works in such a "same
    >> >>> type"-only way: It does not try to coerce, but gives an
    >> >>> error when the classes differ, even in this somewhat
    >> >>> debatable case :
    >> >>>
    >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
    >> >>> `false` has type 'double' not 'integer'
    >> >>> >
    >> >>>
    >> >>> As documented, if_else() is clearly stricter than
    >> >>> ifelse() and e.g., also does no recycling (but of
    >> >>> length() 1).
    >> >>
    >> >> I agree that if_else() is currently too strict - it's
    >> >> particularly annoying if you want to replace some values
    >> >> with a missing:
    >> >>
    >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
    >> >> has type 'integer' not 'logical'
    >> >>
    >> >> But I would like to make sure that this remains an error:
    >> >>
    >> >> if_else(x > 5, x, "BLAH")
    >> >>
    >> >> Because that seems more likely to be a user error (but
    >> >> reasonable people might certainly believe that it should
    >> >> just work)
    >> >>
    >> >> dplyr is more accommodating in other places (i.e. in
    >> >> bind_rows(), collapse() and the joins) but it's
    >> >> surprisingly hard to get all the details right. For
    >> >> example, what should the result of this call be?
    >> >>
    >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
    >> >> factor(c("c", "b"))
    >> >>
    >> >> Strictly speaking I think you could argue it's an error,
    >> >> but that's not very user-friendly. Should it be a factor
    >> >> with the union of the levels? Should it be a character
    >> >> vector + warning? Should the behaviour change if one set
    >> >> of levels is a subset of the other set?
    >> >>
    >> >> There are similar issues for POSIXct (if the time zones
    >> >> are different, which should win?), and difftimes
    >> >> (similarly for units).  Ideally you'd like the behaviour
    >> >> to be extensible for new S3 classes, which suggests it
    >> >> should be a generic (and for the most general case, it
    >> >> would need to dispatch on both arguments).
    >> 
    >> > One possible principle would be to use c() -
    >> > i.e. construct out as
    >> 
    >> > out <- c(yes[0], no[0]
    >> > length(out) <- max(length(yes), length(no))
    >> 
    >> yes; this would require that a  `length<-` method works for the
    >> class of the result.
    >> 
    >> Duncan Murdoch mentioned a version of this, in his very
    >> first reply:
    >> 
    >> ans <- c(yes, no)[seq_along(test)]
    >> ans <- ans[seq_along(test)]
    >> 
    >> which is less efficient for atomic vectors, but requires
    >> less from the class: it "only" needs `c` and `[` to work
    >> 
    >> and a mixture of your two proposals would be possible too:
    >> 
    >> ans <- c(yes[0], no[0])
    >> ans <- ans[seq_along(test)]
    >> 
    >> which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
    >> but that's a buglet in the  c.mpfr() implementation of my Rmpfr
    >> package... (which has already been fixed in the development version on
    >> R-forge,
    >> https://r-forge.r-project.org/R/?group_id=386)
    >> 
    >> > But of course that wouldn't help with factor responses.
    >> 
    >> Yes.  However, a version of Duncan's suggestion -- of treating 'yes' first
    >> -- does help in that case.
    >> 
    >> For once, mainly as "feasability experiment",
    >> I have created a github gist to make my current ifelse2() proposal
    >> available
    >> for commenting, cloning, pullrequesting, etc:
    >> 
    >> Consisting of 2 files
    >> - ifelse-def.R :  Functions definitions only, basically all the current
    >> proposals, called  ifelse*()
    >> - ifelse-checks.R : A simplistic checking function
    >> and examples calling it, notably demonstrating that my
    >> ifelse2()  does work with
    >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
    >> and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")
    >> 
    >> Also if you are not on github, you can quickly get to the ifelse2()
    >> definition :
    >> 
    >> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
    >> 94#file-ifelse-def-r-L168
    >> 
    >> > Also, if you're considering an improved ifelse(), I'd
    >> > strongly urge you to consider adding an `na` argument,
    >> 
    >> I now did (called it 'NA.').
    >> 
    >> > so that you can use ifelse() to transform all three
    >> > possible values in a logical vector.
    >> 
    >> > Hadley
    >> > -- http://hadley.nz
    >> 
    >> For those who really hate GH (and don't want or cannot easily follow the
    >> above URL), here's my current definition:
    >> 
    >> 
    >> ##' Martin Maechler, 14. Nov 2016 --- taking into account Duncan M. and
    >> Hadley's
    >> ##' ideas in the R-devel thread starting at (my mom's 86th birthday):
    >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
    >> ifelse2 <- function (test, yes, no, NA. = NA) {
    >> if(!is.logical(test)) {
    >> if(is.atomic(test))
    >> storage.mode(test) <- "logical"
    >> else ## typically a "class"; storage.mode<-() typically fails
    >> test <- if(isS4(test)) methods::as(test, "logical") else
    >> as.logical(test)
    >> }
    >> 
    >> ## No longer optimize the  "if (a) x else y"  cases:
    >> ## Only "non-good" R users use ifelse(.) instead of if(.) in these
    >> cases.
    >> 
    >> ans <-
    >> tryCatch(rep(if(is.object(yes) && identical(class(yes), class(no)))
    >> ## as c(o) or o[0] may not work for the class
    >> yes else c(yes[0], no[0]), length.out =
    >> length(test)),
    >> error = function(e) { ## try asymmetric, yes-leaning
    >> r <- yes
    >> r[!test] <- no[!test]
    >> r
    >> })
    >> ok <- !(nas <- is.na(test))
    >> if (any(test[ok]))
    >> ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
    >> if (any(!test[ok]))
    >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
    >> ans[nas] <- NA. # possibly coerced to class(ans)
    >> ans
    >> }
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 



    > -- 
    > Gabriel Becker, PhD
    > Associate Scientist (Bioinformatics)
    > Genentech Research

    > [[alternative HTML version deleted]]



More information about the R-devel mailing list