[R] Quirks with system.time and simulations

Gabor Grothendieck ggrothendieck at myway.com
Mon Jun 14 04:41:00 CEST 2004


I don't know the answer but I tried running each of the following a few
times:

gc(); system.time(for(i in 1:15)as.POSIXlt(paste(y,m,d, sep="-")))
gc(); system.time(for(i in 1:15)ymd.to.POSIXlt(y, m, d))

and noticed that the Vcells gc trigger and Mb used varied all over
the place.  Does that suggest anything?

Patrick Connolly <p.connolly <at> hortresearch.co.nz> writes:

: 
: I tried the code that Richard O'Keefe posted last week, to wit:
: 
: library(chron)
:     ymd.to.POSIXlt <-
:         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
:     n <- 100000
:     y <- sample(1970:2004, n, replace=TRUE)
:     m <- sample(1:12,      n, replace=TRUE)
:     d <- sample(1:28,      n, replace=TRUE)
:     system.time(ymd.to.POSIXlt(y, m, d))
:     [1]  8.78  0.10 31.76  0.00  0.00
:     system.time(as.POSIXlt(paste(y,m,d, sep="-")))
:     [1] 14.64  0.13 53.30  0.00  0.00
: 
: 
: On a somewhat newer machine, I got
: 
: $ R --vanilla
: 
: R : Copyright 2004, The R Foundation for Statistical Computing
: Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3
: 
: [...]
: 
: > library(chron)
: >     ymd.to.POSIXlt <-
: +         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
: >     n <- 100000
: >     y <- sample(1970:2004, n, replace=TRUE)
: >     m <- sample(1:12,      n, replace=TRUE)
: >     d <- sample(1:28,      n, replace=TRUE)
: > 
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.67 0.24 2.01 0.00 0.00
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 3.06 0.02 3.08 0.00 0.00
: > 
: 
: But then I tried a few more times...
: 
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.09 0.04 1.13 0.00 0.00
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.11 0.09 1.20 0.00 0.00
: >
: 
: The second time is a lot faster, but subsequent ones don't "improve further".
: '
: But with the "standard" function,
: 
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 2.64 0.02 2.66 0.00 0.00
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 2.82 0.03 2.85 0.00 0.00
: >
: ... it does improve slightly but rather a lot less.
: 
: THEN
: 
: If I compare the two methods in the reverse order,
: 
: $ R --vanilla
: 
: R : Copyright 2004, The R Foundation for Statistical Computing
: Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3
: 
: [....]
: 
: > library(chron)
: >     ymd.to.POSIXlt <-
: +         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
: >     n <- 100000
: >     y <- sample(1970:2004, n, replace=TRUE)
: >     m <- sample(1:12,      n, replace=TRUE)
: >     d <- sample(1:28,      n, replace=TRUE)
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 3.66 0.02 3.76 0.00 0.00
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.65 0.05 1.70 0.00 0.00
: > 
: > 
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 2.59 0.02 2.61 0.00 0.00
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 2.73 0.00 2.74 0.00 0.00
: > 
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.29 0.01 1.30 0.00 0.00
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 0.94 0.00 0.94 0.00 0.00
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.06 0.01 1.07 0.00 0.00
: > 
: 
: It seems as though the first simulation makes it "easier" for
: subsequent simulations of the same type AND also for simulations of a
: somewhat different type also.  The degree to which it "helps" varies
: according to just what is being run (no surprise there).  What I can't
: figure out is what is happening that makes it quicker for second and
: subsequent runs.
: 
: I even tried doing a gc() and setting seeds before each run to make a
: more direct comparison, but it made no difference other than being
: slightly less variable.  I have seen a similar phenomenon in other
: types of simulations.
: 
: In the case of this code, it makes no difference whether n is 100 or
: 10000000.  Would that be attibutable to lazy evaluation?
: 
: > version
:          _                
: platform i686-pc-linux-gnu
: arch     i686             
: os       linux-gnu        
: system   i686, linux-gnu  
: status                    
: major    1                
: minor    9.0              
: year     2004             
: month    04               
: day      12               
: language R         
: 
: It's not exactly a problem, but it could have a bearing on comparing
: processing times which is something that happens from time to time.
: In the comparison that gave rise to the code above, the order would
: have made a substantial difference to the perceived effectiveness of
: Richard's code.
:




More information about the R-help mailing list