[R] How to extract last value in each group

arun smartpink111 at yahoo.com
Thu Aug 15 22:38:54 CEST 2013


I tried it again on a fresh start using the data.table alone:
Now.

 dt1 <- data.table(dat2, key=c('Date', 'Time'))
 system.time(ans <- dt1[, .SD[.N], by='Date'])
#   user  system elapsed 
# 40.908   0.000  40.981 
#Then tried:
system.time(res7<- dat2[cumsum(rle(dat2[,1])$lengths),])
 #  user  system elapsed 
 # 0.148   0.000   0.151  #same time as before





It might be CPU dependent.  I use Dell XPS L502X



	* Processor 2nd Gen Core i7 Intel i7-2630QM / 2 GHz ( 2.9 GHz ) ( Quad-Core ) 
	* Memory 6 GB / 8 GB (max) 
	* Hard Drive 640 GB - Serial ATA-300 - 7200 rpm  

sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.8.8 stringr_0.6.2    reshape2_1.2.2  

loaded via a namespace (and not attached):
[1] plyr_1.8


----- Original Message -----
From: Steve Lianoglou <lianoglou.steve at gene.com>
To: arun <smartpink111 at yahoo.com>
Cc: William Dunlap <wdunlap at tibco.com>; Noah Silverman <noahsilverman at ucla.edu>; Michael Hannon <jmhannon.ucdavis at gmail.com>; David Winsemius <dwinsemius at comcast.net>; R help <r-help at r-project.org>
Sent: Thursday, August 15, 2013 3:52 PM
Subject: Re: [R] How to extract last value in each group

Hi,

Looks like you have some free time on your hands :-)

Something looks a bit off here, though, I was surprised to see the
time you reported for the data.table option:

> #separate the data.table creation step:
>  dt1 <- data.table(dat2, key=c('Date', 'Time'))
> system.time(ans <- dt1[, .SD[.N], by='Date'])
> # user  system elapsed
> # 38.500   0.000  38.566

When I do the same, this is what I get:

   user  system elapsed
  0.064   0.009   0.074

I know this is very much dependent on what type of cpu you are running
on, but unless you're running your tests on a commodore 64, looks like
something went wonky.

Lastly, neither here or there: some of the solutions assumed the data
were already grouped and sorted for you, so there are clever ways to
pick off the last one (cumsum and the like), but I've found it prudent
to always assume that the data has been handed to me by a rather
clever and insidious adversary and taking steps to ensure you are
getting what you want (whether using an index on a data.table, or some
combo of split + max/which.max) probably is a good way to go.

My 2 cents,

-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech




More information about the R-help mailing list