[R] Repeated Aggregation with data.table

R. Michael Weylandt michael.weylandt at gmail.com
Wed Aug 8 04:53:28 CEST 2012


On Tue, Aug 7, 2012 at 4:36 PM, Elliot Joel Bernstein
<elliot.bernstein at fdopartners.com> wrote:
> I have been using ddply to do aggregation, and I frequently define a
> single aggregation function that I use to aggregate over different
> groups. For example,
>
> require(plyr)
>
> dat <- data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100,
> replace = TRUE), z = rnorm(100))
>
> f <- function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) }
>
> ddply(dat, "x", f)
> ddply(dat, "y", f)
> ddply(dat, c("x", "y"), f)
>
> I recently discovered the data.table package, which dramatically
> speeds up the aggregation:
>
> require(data.table)
> dat <- data.table(dat)
>
> dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)]
> dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)]
> dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)]
>
> But I can't figure out how to save the aggregation function
> "list(mean.z = mean(z), sd.z = sd(z))" as a variable that I can reuse,
> similar to the function "f" above. Can someone please explain how to
> do that?

One exceptionally kludgy way:

zzz <- expression(list(mean.z = mean(z), sd.z = sd(z)))

dat[, eval(zzz), list(x,y)]

Michael

>
> Thanks.
>
> - Elliot
>
> --
> Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
> 134 Mount Auburn Street | Cambridge, MA | 02138
> Phone: (617) 503-4619 | Email: elliot.bernstein at fdopartners.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list