[R] Partial aggregate on sorted data

jim holtman jholtman at gmail.com
Wed Oct 24 15:47:33 CEST 2007


Is this something like you want:

> set.seed(1)
> test <- data.frame(value=runif(100), fact=sample(LETTERS[1:5], 100, TRUE))
> result <- tapply(test$value, test$fact, function(x, sort, subset){
+     x <- x[order(x, decreasing=(sort == "DECENDING"))]
+     mean(head(x, length(x) * subset))
+ }, sort="DECENDING", subset=.33)
> result
        A         B         C         D         E
0.8302502 0.8583468 0.7461504 0.7594074 0.9143997


On 10/24/07, Yves Moisan <ymoisan at groupesm.com> wrote:
>
> Hi All,
>
> I'm looking for ways to compute aggregate statistics (with the aggregate
> function) but with an option for sorting and selecting a subset of the data
> frame.  For example, I have would like to turn this :
>
> aggregate(myDataframe$TargetValue,list(SomeFactor =
> myDataframe$SomeFactor),mean)
>
> into something like
>
> aggregate(myDataframe$TargetValue,list(SomeFactor =
> myDataframe$SomeFactor),mean, sort=DESCENDING, subset=0.33)
>
> where sort would sort TargetValue per factor level and subset would be (for
> example) a value between 0 and 1.  The example above would give me the mean
> for the top third of TargetValue per factor.
>
> Any way of doing this without having to use temporary variables to stuff my
> vectors, use length(), etc ?
> --
> View this message in context: http://www.nabble.com/Partial-aggregate-on-sorted-data-tf4683988.html#a13384556
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list