[R] get top n rows group by a column from a dataframe

Phil Spector spector at stat.berkeley.edu
Thu Sep 16 19:44:44 CEST 2010


Richard -
    Is this what you're looking for?

> sdata = data.frame(company=sample(LETTERS[1:8],1000,replace=TRUE),
                      person=1:1000,
                      salary=rnorm(1000))
> splitsdata = split(sdata,sdata$company)
> res = do.call(rbind,sapply(splitsdata,simplify=FALSE,
                        function(x)x[order(x$salary,decreasing=TRUE),][1:5,]))
> row.names(res) = NULL
> res
    company person   salary
1        A    560 2.721923
2        A    538 2.456439
3        A    594 2.093376
4        A    947 1.960166
5        A    334 1.544756
6        B    671 2.484698
7        B    533 2.328799
          . . .
 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu



On Thu, 16 Sep 2010, Tan, Richard wrote:

> Hi Richard
>
>
>
> Thanks for the suggestion, but I want top 5 salary for each company, not
> the whole list.  I don't see how your way can work?
>
>
>
> Thanks,
>
> Richard
>
>
>
> From: RICHARD M. HEIBERGER [mailto:rmh at temple.edu]
> Sent: Thursday, September 16, 2010 11:53 AM
> To: Tan, Richard
> Cc: r-help at r-project.org
> Subject: Re: [R] get top n rows group by a column from a dataframe
>
>
>
>> tmp <- data.frame(matrix(rnorm(30), 10, 3,
>                           dimnames=list(letters[1:10],
>                                         c("company", "person",
> "salary"))))
>> tmp
>      company     person      salary
> a -1.04590176 -0.7841855  1.07150503
> b -1.06643101  0.6545647  0.43920454
> c  0.72894531 -1.3812867  0.41313659
> d -0.39265263 -0.3871271  0.69404325
> e  0.54028124  0.7124772  0.66630904
> f -1.46931714 -0.3823353  0.03069797
> g -0.33283666 -0.6351862  0.37920017
> h -0.79977129  0.2605315  0.92373900
> i  0.80614119  0.3727227 -1.16560563
> j  0.03165012  0.4690400 -0.81966285
>> order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))]
> [1]  5  2 10  9  8
>> tmp[order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))],]
>      company    person     salary
> e  0.54028124 0.7124772  0.6663090
> b -1.06643101 0.6545647  0.4392045
> j  0.03165012 0.4690400 -0.8196628
> i  0.80614119 0.3727227 -1.1656056
> h -0.79977129 0.2605315  0.9237390
>
> You can easily write a function for that.
> top <- function(DF, varname, howmany) {}
>
>
> On Thu, Sep 16, 2010 at 11:39 AM, Tan, Richard <RTan at panagora.com>
> wrote:
>
> 	Hi, is there an R function like sql's TOP key word?
>
> 	I have a dataframe that has 3 columns: company, person, salary
>
> 	How do I get top 5 highest paid person for each company, and if
> I have
> 	fewer than 5 people for a company, just return all of them?
>
> 	Thanks,
>
> 	Richard
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list