[R] get top n rows group by a column from a dataframe

William Dunlap wdunlap at tibco.com
Thu Sep 16 20:18:08 CEST 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tan, Richard
> Sent: Thursday, September 16, 2010 8:39 AM
> To: r-help at r-project.org
> Subject: [R] get top n rows group by a column from a dataframe
> 
> Hi, is there an R function like sql's TOP key word?
> 
>  
> 
> I have a dataframe that has 3 columns: company, person, salary
> 
>  
> 
> How do I get top 5 highest paid person for each company, and if I have
> fewer than 5 people for a company, just return all of them?

You could use ave() to assign a within-company ranking
to each person and later extract the rows with the ranks
you want (or sort by rank, etc.):

> set.seed(1)
> DF <-
data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
Person=LETTERS[1:18],Salary=runif(18)*1e5)
> DF <- within(DF, rank <- ave(Salary, Company,
FUN=function(x)rev(order(x))))
> DF
   Company Person    Salary rank
1     Ford      A 38003.518    4
2   Toyota      B 77744.522    2
3   Toyota      C 93470.523    1
4       GM      D 21214.252    6
5     Ford      E 65167.377    2
6       GM      F 12555.510    7
7       GM      G 26722.067    5
8   Toyota      H 38611.409    5
9   Toyota      I  1339.033    3
10    Ford      J 38238.796    3
11    Ford      K 86969.085    1
12    Ford      L 34034.900    5
13      GM      M 48208.012    4
14  Toyota      N 59956.583    6
15      GM      O 49354.131    3
16  Toyota      P 18621.760    4
17      GM      Q 82737.332    1
18      GM      R 66846.674    2
> subset(DF, rank==1)
   Company Person   Salary rank
3   Toyota      C 93470.52    1
11    Ford      K 86969.08    1
17      GM      Q 82737.33    1
> subset(DF, rank<=5)
   Company Person    Salary rank
1     Ford      A 38003.518    4
2   Toyota      B 77744.522    2
3   Toyota      C 93470.523    1
5     Ford      E 65167.377    2
7       GM      G 26722.067    5
8   Toyota      H 38611.409    5
9   Toyota      I  1339.033    3
10    Ford      J 38238.796    3
11    Ford      K 86969.085    1
12    Ford      L 34034.900    5
13      GM      M 48208.012    4
15      GM      O 49354.131    3
16  Toyota      P 18621.760    4
17      GM      Q 82737.332    1
18      GM      R 66846.674    2

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> 
>  
> 
> Thanks,
> 
> Richard
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list