[R] sum specific rows in a data frame

Chuck vijay.nori at gmail.com
Thu Apr 15 03:16:25 CEST 2010


Depending on the size of the dataframe and the operations you are
trying to perform, aggregate or ddply may be better.  In the function
below, df has the same structure as your dataframe.

Check out this code which runs aggregate and ddply for different
dataframe sizes.
============================
require(plyr)

CompareAggregation <- function(n) {
    df = data.frame(id=c(rep("A",15*n), rep("B",10*n), rep("C",
20*n)))
    df$fltval = rnorm(nrow(df))
    df$intval = rbinom(nrow(df), 1000, 0.8)
    t1 <- system.time(zz1 <- aggregate(list(fltsum=df$fltval,intsum=df
$intval), list(id=df$id), sum))
    t2 <- system.time(zz2 <- ddply(df, .(id), function(x) c(sum(x
$fltval), sum(x$intval)) ))
    return(c(agg=t1[[1]],ddply=t2[[1]]))
}

z <- c(10^seq(1,5))
names(z) <- as.character(z)
res.df <- t(data.frame(lapply(z, CompareAggregation)))
print(res.df)
============================


On Apr 14, 11:43 am, "arnaud Gaboury" <arnaud.gabo... at gmail.com>
wrote:
> Thank you for your help. The best I have found is to use the ddply function.
>
> > pose



More information about the R-help mailing list