[R] aggregating columns in a data frame in different ways

Gabor Grothendieck ggrothendieck at gmail.com
Sat Apr 29 00:57:11 CEST 2006


Here are three possibilities:

1. aggregate on the columns that you want to sum and aggregate on
the columns that you want to average and then merge them:

By <- A[, 2, drop = FALSE]
merge(aggregate(A[, 3, drop = FALSE], By, sum),
     aggregate(A[, 4, drop = FALSE], By, mean))

2. use by:

f <- function(x) with(x, c(count = sum(count), value = mean(value)))
do.call("rbind", by(A[, 3:4], A[, 2, drop = FALSE], f))

3. use summaryBy in the doBy package picking off the appropriate
columns in the output:

library(doBy)
summaryBy(. ~ type, A[, -1], FUN = c(sum, mean))[, c(1, 2, 5)]


On 4/28/06, kavaumail-r at yahoo.com <kavaumail-r at yahoo.com> wrote:
> I would like to use aggregate() to combine statistics
> for several days in a data frame. My data frame looks
> similar to this:
>
>   date        type  count  value
> 1  2006-04-01     A     10   99.6
> 2  2006-04-01     B      4   33.2
> 3  2006-04-02     A     22   43.2
> 4  2006-04-02     B      8   44.9
> 5  2006-04-03     A     12   12.4
> 6  2006-04-03     B     14   18.5
>
> ('date' is a factor, and my actual data frame has
> about 100 different 'types', not just two)
>
> I would like to sum up the 'counts' per 'type', and
> get an average of the 'values' per 'type'. In other
> words, I would like my results to look like this:
>
>   type  count  value
> 1  A     44     51.73333
> 2  B     26     32.2
>
> The way I'm doing this now is to tear the table apart
> into its individual columns, then apply aggregate() to
> each column individually (using the 'type' column for
> the 'by' parameter), and finally putting everything
> back together, like this:
>
> > A.count = aggregate(A$count, list(type=A$type), sum)
> > A.value = aggregate(A$value, list(type=A$type),
> mean)
> > B = data.frame(type=A.count$type, count=A.count$x,
> value=A.value$x)
>
> My actual table is a bit more involved than in this
> simple example, however, so this becomes quite
> tedious.
>
> I am hoping that there is a simpler way for doing
> this, for example by providing different FUN
> parameters for each column to the aggregate()
> function.
>
> I would appreciate any suggestions.
> Thanks
> Klaus
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list