[R] Possible Improvement to sapply
Martin Maechler
maechler at stat.math.ethz.ch
Tue Mar 13 17:22:23 CET 2018
>>>>> Doran, Harold <HDoran at air.org>
>>>>> on Tue, 13 Mar 2018 16:14:19 +0000 writes:
> You’re right, it sure does. My suggestion causes it to fail when simplify = ‘array’
> From: William Dunlap [mailto:wdunlap at tibco.com]
> Sent: Tuesday, March 13, 2018 12:11 PM
> To: Doran, Harold <HDoran at air.org>
> Cc: r-help at r-project.org
> Subject: Re: [R] Possible Improvement to sapply
> Wouldn't that change how simplify='array' is handled?
>> str(sapply(1:3, function(x)diag(x,5,2), simplify="array"))
> int [1:5, 1:2, 1:3] 1 0 0 0 0 0 1 0 0 0 ...
>> str(sapply(1:3, function(x)diag(x,5,2), simplify=TRUE))
> int [1:10, 1:3] 1 0 0 0 0 0 1 0 0 0 ...
>> str(sapply(1:3, function(x)diag(x,5,2), simplify=FALSE))
> List of 3
> $ : int [1:5, 1:2] 1 0 0 0 0 0 1 0 0 0
> $ : int [1:5, 1:2] 2 0 0 0 0 0 2 0 0 0
> $ : int [1:5, 1:2] 3 0 0 0 0 0 3 0 0 0
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com<http://tibco.com>
Yes, indeed, thank you Bill!
I sometimes marvel at how much the mental capacities of R core
are underestimated. Of course, nobody is perfect, but the bugs
we produce are really more subtle than that ... ;-)
Martin Maechler
R core
> On Tue, Mar 13, 2018 at 6:23 AM, Doran, Harold <HDoran at air.org<mailto:HDoran at air.org>> wrote:
> While working with sapply, the documentation states that the simplify argument will yield a vector, matrix etc "when possible". I was curious how the code actually defined "as possible" and see this within the function
> if (!identical(simplify, FALSE) && length(answer))
> This seems superfluous to me, in particular this part:
> !identical(simplify, FALSE)
> The preceding code could be reduced to
> if (simplify && length(answer))
> and it would not need to execute the call to identical in order to trigger the conditional execution, which is known from the user's simplify = TRUE or FALSE inputs. I *think* the extra call to identical is just unnecessary overhead in this instance.
> Take for example, the following toy example code and benchmark results and a small modification to sapply:
> myList <- list(a = rnorm(100), b = rnorm(100))
> answer <- lapply(X = myList, FUN = length)
> simplify = TRUE
> library(microbenchmark)
> mySapply <- function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE){
> FUN <- match.fun(FUN)
> answer <- lapply(X = X, FUN = FUN, ...)
> if (USE.NAMES && is.character(X) && is.null(names(answer)))
> names(answer) <- X
> if (simplify && length(answer))
> simplify2array(answer, higher = (simplify == "array"))
> else answer
> }
>> microbenchmark(sapply(myList, length), times = 10000L)
> Unit: microseconds
> expr min lq mean median uq max neval
> sapply(myList, length) 14.156 15.572 16.67603 15.926 16.634 650.46 10000
>> microbenchmark(mySapply(myList, length), times = 10000L)
> Unit: microseconds
> expr min lq mean median uq max neval
> mySapply(myList, length) 13.095 14.864 16.02964 15.218 15.573 1671.804 10000
> My benchmark timings show a timing improvement with only that small change made and it is seemingly nominal. In my actual work, the sapply function is called millions of times and this additional overhead propagates to some overall additional computing time.
> I have done some limited testing on various real data to verify that the objects produced under both variants of the sapply (base R and my modified) yield identical objects when simply is both TRUE or FALSE.
> Perhaps someone else sees a counterexample where my proposed fix does not cause for sapply to behave as expected.
> Harold
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> [[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list