[R] inefficient ifelse() ?

rex.dwyer at syngenta.com rex.dwyer at syngenta.com
Wed Mar 2 16:53:46 CET 2011


Hi Ivo,
It might be useful for you to study the examples below.
The key from a programming language point of view is that functions like ifelse are functions of whole vectors, not elements of vectors.  You either evaluate an argument or you don't; you don't evaluate only part of argument.  (Somebody correct me if I'm wrong.)
As you can see from the examples, if there are no TRUEs or no FALSEs in the condition, the corresponding arms are not evaluated, but if there are some of each, both must be evaluated.  This a property of the entire condition vector.  You can see all this if you type ifelse (not ?ifelse, just ifelse) and look at the definition.
If you want to operate on elements of vectors, you need to use subsetting, e.g.:
s = rep(NA,length(t)); b=t%%2==0; s[b]=g(t[b]); s[!b]=f(t[!b])
I agree that it might be counterintuitive for a beginner, but so is 0!=0^0=1, and both follow from first principles. (e.g. n! = n(n-1)!)
"Counterintuitive" is not the same as "incorrect", and "correct" is not the same as "efficient".  :)
HTH
Rex

> t = 1:30
> ifelse(t%%2==0,g(t),f(t))
g for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
f for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 [1]  2  6  6 12 10 18 14 24 18 30 22 36 26 42 30 48 34 54 38 60 42 66 46 72 50
[26] 78 54 84 58 90

> t = 2*(1:30)
> ifelse(t%%2==0,g(t),f(t))
g for 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
 [1]   6  12  18  24  30  36  42  48  54  60  66  72  78  84  90  96 102 108 114
[20] 120 126 132 138 144 150 156 162 168 174 180

> t = 2*(1:30)+1
> ifelse(t%%2==0,g(t),f(t))
f for 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
 [1]   6  10  14  18  22  26  30  34  38  42  46  50  54  58  62  66  70  74  78
[20]  82  86  90  94  98 102 106 110 114 118 122

> t = rep(c(1,2,NA),3)
> ifelse(t%%2==0,g(t),f(t))
g for 1 2 NA 1 2 NA 1 2 NA
f for 1 2 NA 1 2 NA 1 2 NA
[1]  2  6 NA  2  6 NA  2  6 NA

> t = rep(NA,10)
> ifelse(t%%2==0,g(t),f(t))
 [1] NA NA NA NA NA NA NA NA NA NA

> t=1:30
> ifelse(c(TRUE,FALSE,FALSE,TRUE),g(t),f(t))
g for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
f for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
[1]  3  4  6 12
>

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of ivo welch
Sent: Tuesday, March 01, 2011 5:20 PM
To: William Dunlap
Cc: r-help
Subject: Re: [R] inefficient ifelse() ?

yikes.  you are asking me too much.

thanks everybody for the information.  I learned something new.

my suggestion would be for the much smarter language designers (than
I) to offer us more or less blissfully ignorant users another
vector-related construct in R.  It could perhaps be named %if% %else%,
analogous to if else (with naming inspired by %in%, and with
evaluation only of relevant parts [just as if else for scalars]), with
different outcomes in some cases, but with the advantage of typically
evaluating only half as many conditions as the ifelse() vector
construct.  %if% %else% may work only in a subset of cases, but when
it does work, it would be nice to have.  it would probably be my first
"goto" function, with ifelse() use only as a fallback.

of course, I now know how to fix my specific issue.  I was just
surprised that my first choice, ifelse(), was not as optimized as I
had thought.

best,

/iaw


On Tue, Mar 1, 2011 at 5:13 PM, William Dunlap <wdunlap at tibco.com> wrote:
> An ifelse-like function that only evaluated
> what was needed would be fine, but it would
> have to be different from ifelse itself.  The
> trick is to come up with a good parameterization.
>
> E.g., how would it deal with things like
>   ifelse(is.na(x), mean(x, na.rm=TRUE), x)
> or
>   ifelse(x>1, log(x), runif(length(x),-1,0))
> or
>   ifelse(x>1, log(x), -seq_along(x))
> Would it reject such things?  Deciding that the
> x in mean(x,na.rm=TRUE) should be replaced by
> x[is.na(x)] would be wrong.  Deciding that
> runif(length(x)) should be replaced by runif(sum(x>1))
> seems a bit much to expect.  Replacing seq_along(x) with
> seq_len(sum(x>1)) is wrong.  It would be better to
> parameterize the new function so it wouldn't have to
> think about those cases.
>
> Would you want it to depend only on a logical
> vector or perhaps also on a factor (a vectorized
> switch/case function)?
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of ivo welch
>> Sent: Tuesday, March 01, 2011 12:36 PM
>> To: Henrique Dallazuanna
>> Cc: r-help
>> Subject: Re: [R] inefficient ifelse() ?
>>
>> thanks, Henrique.  did you mean
>>
>>     as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
>> list(f, g))))   ?
>>
>> otherwise, you get a matrix.
>>
>> its a good solution, but unfortunately I don't think this can be used
>> to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
>> ift and iff functions will always be evaluated before the function
>> call happens, even with lazy evaluation.  :-(
>>
>> I still think that it makes sense to have a smarter vectorized %if% in
>> a vectorized language like R.  just my 5 cents.
>>
>> /iaw
>>
>> ----
>> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
>>
>>
>>
>>
>>
>> On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna
>> <wwwhsd at gmail.com> wrote:
>> > Try this:
>> >
>> > mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
>> >
>> > On Tue, Mar 1, 2011 at 4:19 PM, ivo welch <ivowel at gmail.com> wrote:
>> >>
>> >> dear R experts---
>> >>
>> >>  t <- 1:30
>> >>  f <- function(t) { cat("f for", t, "\n"); return(2*t) }
>> >>  g <- function(t) { cat("g for", t, "\n"); return(3*t) }
>> >>  s <- ifelse( t%%2==0, g(t), f(t))
>> >>
>> >> shows that the ifelse function actually evaluates both f()
>> and g() for
>> >> all values first, and presumably then just picks left or
>> right results
>> >> based on t%%2.  uggh... wouldn't it make more sense to
>> evaluate only
>> >> the relevant parts of each vector and then reassemble them?
>> >>
>> >> /iaw
>> >> ----
>> >> Ivo Welch
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>> > --
>> > Henrique Dallazuanna
>> > Curitiba-Paraná-Brasil
>> > 25° 25' 40" S 49° 16' 22" O
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. 


More information about the R-help mailing list