[R] lapply (and friends) with data.frames are slow
R. Michael Weylandt
michael.weylandt at gmail.com
Sat Jan 5 21:46:47 CET 2013
On Sat, Jan 5, 2013 at 7:38 PM, Kevin Ushey <kevinushey at gmail.com> wrote:
> Hey guys,
> I noticed something curious in the lapply call. I'll copy+paste the
> function call here because it's short enough:
> lapply <- function (X, FUN, ...)
> FUN <- match.fun(FUN)
> if (!is.vector(X) || is.object(X))
> X <- as.list(X)
> .Internal(lapply(X, FUN))
> Notice that lapply coerces X to a list if the !is.vector || is.object(X)
> check passes.
> Curiously, data.frames fail the test (is.vector(data.frame()) returns
> FALSE); but it seems that coercion of a data.frame
> to a list would be unnecessary for the *apply family of functions.
> Is there a reason why we must coerce data.frames to list for these
> functions? I thought data.frames were essentially just 'structured lists'?
> I ask because it is generally quite slow coercing a (large) data.frame to a
> list, and it seems like this could be avoided for data.frames.
Note sure it's a huge deal, but
It does seem to be an avoidable function call with something like this:
lapply1 <- function (X, FUN, ...)
FUN <- match.fun(FUN)
if (!(is.vector(X) && is.object(X) || is.data.frame(X)))
X <- as.list(X)
On a microbenchmark:
xx <- data.frame(rnorm(5e7), rexp(5e7), runif(5e7))
xx <- cbind(xx, xx, xx, xx, xx)
It saves me about 50% of the time -- that's of course only using a
relatively cheap FUN argument.
Others will hopefully comment more
More information about the R-help