[R] summarize dataframe based on multiple cols, not their combinations

John Kane jrkrideau at inbox.com
Wed Mar 20 21:24:17 CET 2013


Will this do?

library(plyr)
  
  ddply(my_df, .(a), summarize, mm = mean(dat), number = length(dat))

John Kane
Kingston ON Canada


> -----Original Message-----
> From: ashenkin at ufl.edu
> Sent: Wed, 20 Mar 2013 14:57:36 -0500
> To: r-help at r-project.org
> Subject: [R] summarize dataframe based on multiple cols, not their
> combinations
> 
> Hi folks,
> 
> I'm trying to figure out how to get summarized data based on multiple
> columns.  However, instead of giving summaries for every combination of
> categorical columns, I want it for each value of each categorical column
> regardless of the other columns.  I could do this with three different
> commands, but i'm wondering if there's a more elegant way that I'm
> missing.  Thanks!
> 
> allie
> 
>> my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),
> c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
> 
>> my_df
>   a b c dat
> 1 1 0 1  10
> 2 1 0 0  11
> 3 1 0 1  12
> 4 0 1 0  13
> 5 0 1 1  14
> 6 0 1 0  15
> 
>> # not what I want
>> ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat), "n"=nrow(x)))
>   a b c mean n
> 1 0 1 0   14 2
> 2 0 1 1   14 1
> 3 1 0 0   11 1
> 4 1 0 1   11 2
> 
> What I want:
>   a b c mean n
> 1 1 * *   11 3
> 2 * 1 *   14 3
> 3 * * 1   12 3
> 
> where "*" refers to any value of the other columns.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!



More information about the R-help mailing list