[Rd] split.Date

McGehee, Robert Robert.McGehee at geodecapital.com
Tue Jul 8 23:39:53 CEST 2008

I wanted to suggest that the below method for split.Date be added to the
base library to significantly speed up splits with values of class Date.
In the below example I show a speed improvement of 175x for 1000 data
points. On a vector of size 1e6, the time difference was 22 minutes for
split.default versus 0.3 seconds for the split.Date function below (!).
Note that this improvement will also substantially improve performance
for the tapply function on class Date as well.


split.Date <- function(x, f, drop=FALSE) {
    x <- split.default(as.integer(x), f, drop=drop)
    for (i in seq(along=x)) class(x[[i]]) <- "Date"

> vals <- round(1000*rnorm(1e4))
> date <- rep(Sys.Date() + -1:1, length.out=1e4)
> system.time(x1 <- split.default(date, vals))
   user  system elapsed 
  7.718   0.042   7.761 
> system.time(x2 <- split.Date(date, vals))
   user  system elapsed 
  0.044   0.000   0.044 
> all.equal(x1, x2)
[1] TRUE

Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor | Boston, MA | 02109
Tel: 617/392-8396    Fax:617/476-6389
mailto:robert.mcgehee at geodecapital.com

This e-mail, and any attachments hereto, are intended fo...{{dropped:12}}

More information about the R-devel mailing list