[R] By processing on two variables at once?

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Nov 12 04:11:39 CET 2009


Hi,

On Wed, Nov 11, 2009 at 8:51 PM, zwarren <zack.warren at yahoo.com> wrote:
>
> Hello!
>
> I'm trying to runs stats on two vars at a time in a big data frame.  I knew
> how to do this in SAS many years ago, but have half-forgotten that as well!
>
> I need, for instance, mean(value) by x-y combination:
> x   y   z   value
> 1   1   1    10
> 1   1   2    20
> 1   2   1    30
>
> with results:
> x   y   mean(value)
> 1   1    15
> 1   2    30

What happend to your "z" column?

Anyway, there are a few ways you can do this.

1. If you just want to use the standard library, try the aggregate
function. Roghly:

R> df <- data.frame(x=c(1,1,1), y=c(1,1,2), z=c(1,2,1), value=c(10,20,30))
R> aggregate(df, by=list(df$x, df$y), mean)
  Group.1 Group.2 x y   z value
1       1       1 1 1 1.5    15
2       1       2 1 2 1.0    30

2. You can try using the plyr library:

R> library(plyr)
R> ddply(df, .(x, y), mean)
  x y   z value
1 1 1 1.5    15
2 1 2 1.0    30

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




More information about the R-help mailing list