[R] Multiple if function

Thu Sep 17 10:17:11 CEST 2015

> On 17 Sep 2015, at 01:42, Dénes Tóth <toth.denes at ttk.mta.hu> wrote:
> 
> 
> 
> On 09/16/2015 04:41 PM, Bert Gunter wrote:
>> Yes! Chuck's use of mapply is exactly the split/combine strategy I was
>> looking for. In retrospect, exactly how one should think about it.
>> Many thanks to all for a constructive discussion .
>> 
>> -- Bert
>> 
>> 
>> Bert Gunter
>> 
>>>>> 
>>>>> Use mapply like this on large problems:
>>>>> 
>>>>> unsplit(
>>>>>   mapply(
>>>>>       function(x,z) eval( x, list( y=z )),
>>>>>       expression( A=y*2, B=y+3, C=sqrt(y) ),
>>>>>       split( dat$Flow, dat$ASB ),
>>>>>       SIMPLIFY=FALSE),
>>>>>   dat$ASB)
>>>>> 
>>>>> Chuck
>>>>> 
> 
> 
> Is there any reason not to use data.table for this purpose, especially if efficiency is of concern?
> 
> ---
> 
> # load data.table and microbenchmark
> library(data.table)
> library(microbenchmark)
> #
> # prepare data
> DF <- data.frame(
>    ASB = rep_len(factor(LETTERS[1:3]), 3e5),
>    Flow = rnorm(3e5)^2)
> DT <- as.data.table(DF)
> DT[, ASB := as.character(ASB)]
> #
> # define functions
> #
> # Chuck's version
> fnSplit <- function(dat) {
>    unsplit(
>        mapply(
>            function(x,z) eval( x, list( y=z )),
>            expression( A=y*2, B=y+3, C=sqrt(y) ),
>            split( dat$Flow, dat$ASB ),
>            SIMPLIFY=FALSE),
>        dat$ASB)
> }
> #
> # data.table-way (IMHO, much easier to read)
> fnDataTable <- function(dat) {
>    dat[,
>        result :=
>            if (.BY == "A") {
>                2 * Flow
>            } else if (.BY == "B") {
>                3 + Flow
>            } else if (.BY == "C") {
>                sqrt(Flow)
>            },
>        by = ASB]
> }
> #
> # benchmark
> #
> microbenchmark(fnSplit(DF), fnDataTable(DT))
> identical(fnSplit(DF), fnDataTable(DT)[, result])
> 
> ---
> 
> Actually, in Chuck's version the unsplit() part is slow. If the order is not of concern (e.g., DF is reordered before calling fnSplit), fnSplit is comparable to the DT-version.
> 

But David’s version is faster than Chuck’s fnSplit. I modified David’s solution slightly to get a result that is identical to fnSplit.

# David's version
# my modification to return a vector just like fnSplit
fnDavid <- function(dat) {
    z <- mapply(
          function(x,z) eval( x, list( y=z )),
          expression(A= y*2, B=y+3, C=sqrt(y) ),
          split( dat$Flow, dat$ASB ),
          USE.NAMES=FALSE, SIMPLIFY=TRUE
        )
    as.vector(t(z))
}

Added this to Dénes's code.
Benchmarking  with R package rbenchmark and testing result like this

library(rbenchmark)
benchmark(fnSplit(DF), fnDataTable(DT),fnDavid(DF))
identical(fnSplit(DF), fnDataTable(DT)[, result])
identical(fnSplit(DF), fnDavid(DF))

gave this:

             test replications elapsed relative user.self sys.self user.child
2 fnDataTable(DT)          100   0.829    1.000     0.762    0.066          0
3     fnDavid(DF)          100   1.615    1.948     1.515    0.098          0
1     fnSplit(DF)          100   2.878    3.472     2.685    0.190          0
  sys.child
2         0
3         0
1         0

> identical(fnSplit(DF), fnDataTable(DT)[, result])
[1] TRUE
> identical(fnSplit(DF), fnDavid(DF))
[1] TRUE

Berend