[R] Subsetting for the ten highest values by group in a dataframe

Phil Spector spector at stat.berkeley.edu
Fri Jan 27 23:08:01 CET 2012


Sam -
    I think that subset is what's throwing you off here --
you need a function that will simply return the 10 rows of
each group with the highest values of x:

function(dat)dat[order(dat$x,decreasing=TRUE)[1:10],]

Then

ddply(df,'z',function(dat)dat[order(dat$x,decreasing=TRUE)[1:10],])

should give you what you want.  In this simple case, you could
also use

do.call(rbind,by(df,df$z,function(dat)dat[order(dat$x,decreasing=TRUE)[1:10],]))

from base R to get the same result.

Hope this helps.
 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu

On Fri, 27 Jan 2012, Sam Albers wrote:

> Hello,
>
> I am looking for a way to subset a data frame by choosing the top ten
> maximum values from that dataframe. As well this occurs within some
> factor levels.
>
> ## I've used plyr here but I'm not married to this approach
> require(plyr)
>
> ## I've created a data.frame with two groups and then a id variable (y)
> df <- data.frame(x=rnorm(400, mean=20), y=1:400, z=c("A","B"))
>
> ## So using ddply I can find the highest value of x
> df.max1 <- ddply(df, c("z"), subset, x==sort(x, TRUE)[1])
>
> ## Or the 2nd highest value
> df.max2 <- ddply(df, c("z"), subset, x==sort(x, TRUE)[2])
>
> ## And so on.... but when I try to make a series of numbers like so
> ## to get the top ten values, I don't get a warning message but
> ## two values that don't really make sense to me
> df.max <- ddply(df, c("z"), subset, x==sort(x, TRUE)[1:10])
>
> ## So no error message when I use the method above, which is clearly wrong.
> ## But I really am not sure how to diagnose the problem.
>
> ## Can anyone suggest a way to subset a data.frame with groups to
> select the top ten max values in that data.frame for each group?
>
> ## Thanks so much in advance?
>
> Sam
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list