[R] How to do aggregate operations with non-scalar functions

Itay Furman itayf at u.washington.edu
Wed Apr 6 00:59:01 CEST 2005


Hi,

I have a data set, the structure of which is something like this:

> a <- rep(c("a", "b"), c(6,6))
> x <- rep(c("x", "y", "z"), c(4,4,4))
> df <- data.frame(a=a, x=x, r=rnorm(12))

The true data set has >1 million rows. The factors "a" and "x"
have about 70 levels each; combined together they subset 'df'
into ~900 data frames.
For each such subset I'd like to compute various statistics
including quantiles, but I can't find an efficient way of
doing this.  Aggregate() gives me the desired structure - 
namely, one row per subset - but I can use it only to compute
a single quantile.

> aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
   a x          x
1 a x  0.1693188
2 a y  0.1566322
3 b y -0.2677410
4 b z -0.6505710

With by() I could compute several quantiles per subset at
each shot, but the structure of the output is not
convenient for further analysis and visualization.

> by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
a: a
x: x
         0%        25% 
-0.7727268  0.1693188 
---------------------------------------------------------- 
a: b
x: x
NULL
----------------------------------------------------------

[snip]

I would like to end up with a data frame like this:

   a x         0%        25% 
1 a x -0.7727268  0.1693188 
2 a y -0.3410671  0.1566322 
3 b y -0.2914710 -0.2677410 
4 b z -0.8502875 -0.6505710

I checked sweep() and apply() and didn't see how to harness
them for that purpose.

So, is there a simple way to convert the object returned
by by() into a data.frame?
Or, is there a better way to go with this?
Finally, if I should roll my own coercion function: any tips?

 	Thank you very much in advance,
 	Itay

----------------------------------------------------------------
itayf at u.washington.edu  /  +1 (206) 543 9040  /  U of Washington




More information about the R-help mailing list