[R] Multiple if function

Dénes Tóth toth.denes at ttk.mta.hu
Thu Sep 17 01:42:37 CEST 2015



On 09/16/2015 04:41 PM, Bert Gunter wrote:
> Yes! Chuck's use of mapply is exactly the split/combine strategy I was
> looking for. In retrospect, exactly how one should think about it.
> Many thanks to all for a constructive discussion .
>
> -- Bert
>
>
> Bert Gunter
>
>>>>
>>>> Use mapply like this on large problems:
>>>>
>>>> unsplit(
>>>>    mapply(
>>>>        function(x,z) eval( x, list( y=z )),
>>>>        expression( A=y*2, B=y+3, C=sqrt(y) ),
>>>>        split( dat$Flow, dat$ASB ),
>>>>        SIMPLIFY=FALSE),
>>>>    dat$ASB)
>>>>
>>>> Chuck
>>>>


Is there any reason not to use data.table for this purpose, especially 
if efficiency is of concern?

---

# load data.table and microbenchmark
library(data.table)
library(microbenchmark)
#
# prepare data
DF <- data.frame(
     ASB = rep_len(factor(LETTERS[1:3]), 3e5),
     Flow = rnorm(3e5)^2)
DT <- as.data.table(DF)
DT[, ASB := as.character(ASB)]
#
# define functions
#
# Chuck's version
fnSplit <- function(dat) {
     unsplit(
         mapply(
             function(x,z) eval( x, list( y=z )),
             expression( A=y*2, B=y+3, C=sqrt(y) ),
             split( dat$Flow, dat$ASB ),
             SIMPLIFY=FALSE),
         dat$ASB)
}
#
# data.table-way (IMHO, much easier to read)
fnDataTable <- function(dat) {
     dat[,
         result :=
             if (.BY == "A") {
                 2 * Flow
             } else if (.BY == "B") {
                 3 + Flow
             } else if (.BY == "C") {
                 sqrt(Flow)
             },
         by = ASB]
}
#
# benchmark
#
microbenchmark(fnSplit(DF), fnDataTable(DT))
identical(fnSplit(DF), fnDataTable(DT)[, result])

---

Actually, in Chuck's version the unsplit() part is slow. If the order is 
not of concern (e.g., DF is reordered before calling fnSplit), fnSplit 
is comparable to the DT-version.


Denes



More information about the R-help mailing list