[R] summarize dataframe based on multiple cols, not their combinations

Alexander Shenkin ashenkin at ufl.edu
Wed Mar 20 20:57:36 CET 2013


Hi folks,

I'm trying to figure out how to get summarized data based on multiple
columns.  However, instead of giving summaries for every combination of
categorical columns, I want it for each value of each categorical column
regardless of the other columns.  I could do this with three different
commands, but i'm wondering if there's a more elegant way that I'm
missing.  Thanks!

allie

> my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),
c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))

> my_df
  a b c dat
1 1 0 1  10
2 1 0 0  11
3 1 0 1  12
4 0 1 0  13
5 0 1 1  14
6 0 1 0  15

> # not what I want
> ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat), "n"=nrow(x)))
  a b c mean n
1 0 1 0   14 2
2 0 1 1   14 1
3 1 0 0   11 1
4 1 0 1   11 2

What I want:
  a b c mean n
1 1 * *   11 3
2 * 1 *   14 3
3 * * 1   12 3

where "*" refers to any value of the other columns.



More information about the R-help mailing list