[Rd] performance issue with as.Date

Paul.Ryan at csiro.au Paul.Ryan at csiro.au
Tue Apr 30 03:51:34 CEST 2013


We encounted a performance problem when a large number of R scripts are run simulatanously.  A large number of stat() system calls to /etc/timezone was limiting how many scripts could be run effectively.  I traced the problem to as.Date.character where strptime() is called without a timezone argument when there is no format argument.

as.Date.character <- function(x, format="", ...)
{
    charToDate <- function(x) {
        xx <- x[1L]
        if(is.na(xx)) {
            j <- 1L
            while(is.na(xx) && (j <- j+1L) <= length(x)) xx <- x[j]
            if(is.na(xx)) f <- "%Y-%m-%d" # all NAs
        }
        if(is.na(xx) ||
           !is.na(strptime(xx, f <- "%Y-%m-%d", tz="GMT")) ||
           !is.na(strptime(xx, f <- "%Y/%m/%d", tz="GMT"))
           ) return(strptime(x, f))
        stop("character string is not in a standard unambiguous format")
    }
    res <- if(missing(format)) charToDate(x) else strptime(x, format, tz="GMT")
    as.Date(res)
}

We could easily workaround this by specifying a format.  My question is, should strptime(x, f) have a tz argument as in the case where a format is specified?

Thanks,

Paul



More information about the R-devel mailing list