[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Heinz Tuechler tuechler at gmx.at
Wed Oct 30 01:00:24 CET 2013


Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the 
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
    user  system elapsed
   62.04    0.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
    user  system elapsed
  388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R 
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec <-
   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
     'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
   dump('df0', file='testdump')
   cat('length:', i, '\n')
   print(system.time(source('testdump', keep.source = FALSE,
                            encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):
> sessionInfo()
R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
> sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
    user  system elapsed
       0       0       0
length: 100
    user  system elapsed
       0       0       0
length: 1000
    user  system elapsed
       0       0       0
length: 10000
    user  system elapsed
    0.02    0.00    0.01
length: 1e+05
    user  system elapsed
    0.21    0.00    0.20
length: 1e+06
    user  system elapsed
    4.47    0.04    4.51
length: 1e+07
    user  system elapsed
   62.04    0.22   62.26
>


output for R version 3.0.2 Patched (2013-10-27 r64116):
> sessionInfo()
R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
> sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
    user  system elapsed
       0       0       0
length: 100
    user  system elapsed
       0       0       0
length: 1000
    user  system elapsed
       0       0       0
length: 10000
    user  system elapsed
    0.01    0.00    0.01
length: 1e+05
    user  system elapsed
    0.36    0.06    0.42
length: 1e+06
    user  system elapsed
    6.02    1.86    7.88
length: 1e+07
    user  system elapsed
  388.63  176.42  566.41
>



-- 
Heinz Tüchler +4317146261 / +436605653878



More information about the R-help mailing list