[R] Sorting and subsetting

Mon Sep 20 20:11:16 CEST 2010

Richard Tan asked a very similar question last week
('get top n rows group by a column from a dataframe').
You could use ave() to make a sequence-number-within-group
vector and choose rows with a small enough value there:
   tmp[ave(integer(nrow(tmp)), tmp$index, FUN=seq_along)<=N, ]
If there are fewer than N rows for a given index this returns
all of them but does not pad their number up to N.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Doran, Harold
> Sent: Monday, September 20, 2010 10:16 AM
> To: R-help
> Subject: [R] Sorting and subsetting
> 
> Suppose I have a data frame, such as the one below:
> 
> tmp <- data.frame(index = gl(2,20), foo = rnorm(40))
> 
> And further assume it is sorted by index and then by the variable foo.
> 
> tmp <- tmp[order(tmp$index, tmp$foo) , ]
> 
> Now, I want to grab the first N rows of tmp for each index. 
> In the end, what I want is the data frame 'result'
> 
> tmp1 <- subset(tmp, index == 1)
> tmp2 <- subset(tmp, index == 2)
> 
> tmp1 <- tmp1[1:5,]
> tmp2 <- tmp2[1:5,]
> result <- rbind(tmp1, tmp2)
> 
> Does anyone see a way to subset and subsequently bind without a loop?
> 
> Harold
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>