[R] Select the last two rows by id group

Tue Mar 20 16:53:08 CET 2007

Very nice! This is almost duplicates the SAS first.var and last.var
ability to choose the first and last observations by group(s).
Substituting the head function in where Marc has the tail function below
will adapt it to the first n. It is more flexible than the SAS approach
because it can do the first/last n rather than just the single first or
last.

Let's say we want to choose the last observation in a county, and
counties have duplicate names in different states. You could sort by
state, then county, then use only county where Marc uses score$id in his
last example below, and it would get the last record for *every* county
regardless of duplicates. Does this sound correct? 

That's a handy bit of code!

Cheers,
Bob

=========================================================
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: muenchen at utk.edu
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html
=========================================================

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> bounces at stat.math.ethz.ch] On Behalf Of Marc Schwartz
> Sent: Tuesday, March 20, 2007 10:59 AM
> To: Lauri Nikkinen
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Select the last two rows by id group
> 
> On Tue, 2007-03-20 at 16:33 +0200, Lauri Nikkinen wrote:
> > Hi R-users,
> >
> > Following this post
> http://tolstoy.newcastle.edu.au/R/help/06/06/28965.html ,
> > how do I get last two rows (or six or ten) by id group out of the
> data
> > frame? Here the example gives just the last row.
> >
> > Sincere thanks,
> > Lauri
> 
> A slight modification to Gabor's solution:
> 
> > score
>   id reading math
> 1  1      65   80
> 2  1      70   75
> 3  1      88   70
> 4  2      NA   65
> 5  3      90   65
> 6  3      NA   70
> 
> # Return the last '2' rows
> # Note the addition of unlist()
> 
> > score[unlist(tapply(rownames(score), score$id, tail,  2)), ]
>   id reading math
> 2  1      70   75
> 3  1      88   70
> 4  2      NA   65
> 5  3      90   65
> 6  3      NA   70
> 
> 
> Note that when tail() returns more than one value, tapply() will
create
> a list rather than a vector:
> 
> > tapply(rownames(score), score$id, tail,  2)
> $`1`
> [1] "2" "3"
> 
> $`2`
> [1] "4"
> 
> $`3`
> [1] "5" "6"
> 
> 
> Thus, we need to unlist() the indices to use them in the subsetting
> process that Gabor used in his solution.
> 
> Another alternative, if the rownames do not correspond to the
> sequential
> row indices as they do in this example:
> 
> > do.call("rbind", lapply(split(score, score$id), tail,  2))
>     id reading math
> 1.2  1      70   75
> 1.3  1      88   70
> 2    2      NA   65
> 3.5  3      90   65
> 3.6  3      NA   70
> 
> 
> This uses split() to create a list of data frames from score, where
> each
> data frame is 'split' by the 'id' column values. tail() is then
applied
> to each data frame using lapply(), the results of which are then
> rbind()ed back to a single data frame.
> 
> HTH,
> 
> Marc Schwartz
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.