[R] How to extract last value in each group

Steve Lianoglou lianoglou.steve at gene.com
Wed Aug 14 23:22:06 CEST 2013


Or with plyr:

R> library(plyr)
R> ans <- ddply(x, .(Date), function(df) df[which.max(df$Time),])

-steve

On Wed, Aug 14, 2013 at 2:18 PM, Steve Lianoglou
<lianoglou.steve at gene.com> wrote:
> While we're playing code golf, likely faster still could be to use
> data.table. Assume your data is in a data.frame named "x":
>
> R> library(data.table)
> R> x <- data.table(x, key=c('Date', 'Time'))
> R> ans <- x[, .SD[.N], by='Date']
>
> -steve
>
> On Wed, Aug 14, 2013 at 2:01 PM, William Dunlap <wdunlap at tibco.com> wrote:
>> A somewhat faster version (for datasets with lots of dates, assuming it is sorted by date and time) is
>>   isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
>>   f3 <- function(dataFrame) {
>>       dataFrame[ isLastInRun(dataFrame$Date), ]
>>   }
>> where your two suggestions, as functions, are
>>   f1 <- function (dataFrame) {
>>       dataFrame[unlist(with(dataFrame, tapply(Time, list(Date), FUN = function(x) x == max(x)))), ]
>>   }
>>   f2 <- function (dataFrame) {
>>       dataFrame[cumsum(with(dataFrame, tapply(Time, list(Date), FUN = which.max))), ]
>>   }
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>>> Of arun
>>> Sent: Wednesday, August 14, 2013 1:08 PM
>>> To: Noah Silverman
>>> Cc: R help
>>> Subject: Re: [R] How to extract last value in each group
>>>
>>> Hi,
>>> Try:
>>> dat1<- read.table(text="
>>>         Date Time      O      H      L      C  U  D
>>> 06/01/2010 1358 136.40 136.40 136.35 136.35  2  12
>>> 06/01/2010 1359 136.40 136.50 136.35 136.50  9  6
>>> 06/01/2010 1400 136.45 136.55 136.35 136.40  8  7
>>> 06/01/2010 1700 136.55 136.55 136.55 136.55  1  0
>>> 06/02/2010  331 136.55 136.70 136.50 136.70  36  6
>>> 06/02/2010  332 136.70 136.70 136.65 136.65  3  1
>>> 06/02/2010  334 136.75 136.75 136.75 136.75  1  0
>>> 06/02/2010  335 136.80 136.80 136.80 136.80  4  0
>>> 06/02/2010  336 136.80 136.80 136.80 136.80  8  0
>>> 06/02/2010  337 136.75 136.80 136.75 136.80  1  2
>>> 06/02/2010  338 136.80 136.80 136.80 136.80  3  0
>>> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>
>>>  dat1[unlist(with(dat1,tapply(Time,list(Date),FUN=function(x) x==max(x)))),]
>>> #         Date Time      O      H      L      C U D
>>> #4  06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>>> #11 06/02/2010  338 136.80 136.80 136.80 136.80 3 0
>>> #or
>>>  dat1[cumsum(with(dat1,tapply(Time,list(Date),FUN=which.max))),]
>>>          Date Time      O      H      L      C U D
>>> 4  06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>>> 11 06/02/2010  338 136.80 136.80 136.80 136.80 3 0
>>>
>>> #or
>>> dat1[as.logical(with(dat1,ave(Time,Date,FUN=function(x) x==max(x)))),]
>>>  #        Date Time      O      H      L      C U D
>>> #4  06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>>> #11 06/02/2010  338 136.80 136.80 136.80 136.80 3 0
>>> A.K.
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: Noah Silverman <noahsilverman at ucla.edu>
>>> To: "R-help at r-project.org" <r-help at r-project.org>
>>> Cc:
>>> Sent: Wednesday, August 14, 2013 3:56 PM
>>> Subject: [R] How to extract last value in each group
>>>
>>> Hello,
>>>
>>> I have some stock pricing data for one minute intervals.
>>>
>>> The delivery format is a bit odd.  The date column is easily parsed and used as an index
>>> for an its object.  However, the time column is just an integer (1:1807)
>>>
>>> I just need to extract the *last* entry for each day.  Don't actually care what time it was,
>>> as long as it was the last one.
>>>
>>> Sure, writing a big nasty loop would work, but I was hoping that someone would be able
>>> to suggest a faster way.
>>>
>>> Small snippet of data below my sig.
>>>
>>> Thanks!
>>>
>>>
>>> --
>>> Noah Silverman, M.S., C.Phil
>>> UCLA Department of Statistics
>>> 8117 Math Sciences Building
>>> Los Angeles, CA 90095
>>>
>>> --------------------------------------------------------------------------
>>>
>>>         Date Time      O      H      L      C  U  D
>>> 06/01/2010 1358 136.40 136.40 136.35 136.35   2  12
>>> 06/01/2010 1359 136.40 136.50 136.35 136.50   9   6
>>> 06/01/2010 1400 136.45 136.55 136.35 136.40   8   7
>>> 06/01/2010 1700 136.55 136.55 136.55 136.55   1   0
>>> 06/02/2010  331 136.55 136.70 136.50 136.70  36   6
>>> 06/02/2010  332 136.70 136.70 136.65 136.65   3   1
>>> 06/02/2010  334 136.75 136.75 136.75 136.75   1   0
>>> 06/02/2010  335 136.80 136.80 136.80 136.80   4   0
>>> 06/02/2010  336 136.80 136.80 136.80 136.80   8   0
>>> 06/02/2010  337 136.75 136.80 136.75 136.80   1   2
>>> 06/02/2010  338 136.80 136.80 136.80 136.80   3   0
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Steve Lianoglou
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech



-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the R-help mailing list