[R] how to make aggregation in R ?

Fri Mar 20 05:15:17 CET 2009

On Thu, Mar 19, 2009 at 8:40 PM, jim holtman <jholtman at gmail.com> wrote:
> Try this technique.  I use it with large data objects since it is
> sometime faster, and uses less memory, by using indices:
>
> x <- read.table(textConnection("  v1 v2 n1 n2
> 1   a a1  1 21
> 2   a a1  2 22
> 3   a a1  3 23
> 4   a a2  4 24
> 5   a a3  5 25
> 6   b b1  6 26
> 7   b b1  7 27
> 8   b b2  8 28
> 9   b b2  9 29
> 10  b b2 10 30
> 11  c c1 11 31
> 12  c c2 12 32
> 13  c c2 13 33
> 14  c c2 14 34
> 15  c c3 15 35
> 16  d d1 16 36
> 17  d d2 17 37
> 18  d d3 18 38
> 19  d d4 19 39
> 20  d d4 20 40"), header=TRUE)
> closeAllConnections()
> # use indices to reduce memory
> x.ind <- split(seq(nrow(x)), list(x$v1, x$v2), drop=TRUE)
> # now aggregate using the indices
> x.agg <- do.call(rbind, lapply(x.ind, function(.seg){
>    data.frame(v1=x$v1[.seg[1]], v2=x$v2[.seg[1]],
>        n1=sum(x$n1[.seg]), n2=sum(x$n2[.seg]))
> }))

This is basically the approach that the plyr package,
http://had.co.nz/plyr, uses behind a user-friendly interface.

Hadley

-- 
http://had.co.nz/