[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Oct 31 09:12:33 CET 2013
On 30/10/2013 21:15, William Dunlap wrote:
> I have to defer to others for policy declarations like how long
> the current format used by load and save should be readable.
You could also ask how long R will last ....
R can still read (but not write) save() formats used in the 1990's. We
would expect R to be able to read saves since R 1.0.0 for as long as R
exists. And as R is Open Source, you would be able to compile it and
dump the objects you want for as long as suitable compilers and OSes
exist .... And of course R is not the only application which will read
the format.
There is no guarantee that source() will be able to parse dumps from
earlier versions of R, and that has not always been true.
People commenting on parse() speed should note the NEWS for R-devel:
• The parser has been modified to use less memory.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: Heinz Tuechler [mailto:tuechler at gmx.at]
>> Sent: Wednesday, October 30, 2013 1:43 PM
>> To: William Dunlap
>> Cc: Carl Witthoft; r-help at r-project.org
>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>>
>> Best thanks for confirming my impression. I use dump for storing large
>> data.frames with a number of attributes for each variable. save/load is
>> much faster, but I am unsure, if such files will be readable by R
>> versions years later.
>> What format/functions would you suggest for data storage/transfer
>> between different (future) R versions?
>>
>> best regards,
>> Heinz
>>
>> on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
>>> I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
>>> when it is parsing long vectors of numeric data. dump/source has never been an
>> efficient
>>> way of transferring data between different R session, but it is much worse
>>> now for long vectors. In 2.15.2 doubling the size of the vector (of lengths
>>> in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
>>> In 3.0.2 that factor is more like 4.4.
>>>
>>> n elapsed-2.15.2 elapsed-3.0.2
>>> 2048 0.003 0.018
>>> 4096 0.006 0.065
>>> 8192 0.013 0.254
>>> 16384 0.025 1.067
>>> 32768 0.050 4.114
>>> 65536 0.100 16.236
>>> 131072 0.219 66.013
>>> 262144 0.808 291.883
>>> 524288 2.022 1285.265
>>> 1048576 4.918 NA
>>> 2097152 9.857 NA
>>> 4194304 22.916 NA
>>> 8388608 49.671 NA
>>> 16777216 101.042 NA
>>> 33554432 512.719 NA
>>>
>>> I tried this with 64-bit R on a Linux box. The NA's represent sizes that did not
>>> finish while I was at a 1 1/2 hour dentist's apppointment. The timing function
>>> was:
>>> test <- function(n = 2^(11:25))
>>> {
>>> tf <- tempfile()
>>> on.exit(unlink(tf))
>>> t(sapply(n, function(n){
>>> dput(log(seq_len(n)), file=tf)
>>> print(c(n=n, system.time(parse(file=tf))[1:3]))
>>> }))
>>> }
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf
>>>> Of Carl Witthoft
>>>> Sent: Wednesday, October 30, 2013 5:29 AM
>>>> To: r-help at r-project.org
>>>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>>>>
>>>> Did you run the identical code on the identical machine, and did you verify
>>>> there were no other tasks running which might have limited the RAM available
>>>> to R? And equally important, did you run these tests in the reverse order
>>>> (in case R was storing large objects from the first run, thus chewing up
>>>> RAM)?
>>>>
>>>>
>>>>
>>>> Dear All,
>>>>
>>>> is it known that source works much faster in R 2.15.2 than in R 3.0.2 ?
>>>> In the example below I observe e.g. for a data.frame with 10^7 rows the
>>>> following timings:
>>>>
>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>> length: 1e+07
>>>> user system elapsed
>>>> 62.04 0.22 62.26
>>>>
>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>> length: 1e+07
>>>> user system elapsed
>>>> 388.63 176.42 566.41
>>>>
>>>> Is there a way to speed R version 3.0.2 up to the performance of R
>>>> version 2.15.2?
>>>>
>>>> best regards,
>>>>
>>>> Heinz Tüchler
>>>>
>>>>
>>>> example:
>>>> sessionInfo()
>>>> sample.vec <-
>>>> c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>>>> 'named', 'file', 'or', 'URL', 'or', 'connection')
>>>> dmp.size <- c(10^(1:7))
>>>> set.seed(37)
>>>>
>>>> for(i in dmp.size) {
>>>> df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>> dump('df0', file='testdump')
>>>> cat('length:', i, '\n')
>>>> print(system.time(source('testdump', keep.source = FALSE,
>>>> encoding='')))
>>>> }
>>>>
>>>> output for R version 2.15.2 Patched (2012-11-29 r61184):
>>>>> sessionInfo()
>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=German_Switzerland.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>> sample.vec <-
>>>> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>> 'the',
>>>> + 'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>> dmp.size <- c(10^(1:7))
>>>>> set.seed(37)
>>>>>
>>>>> for(i in dmp.size) {
>>>> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>> + dump('df0', file='testdump')
>>>> + cat('length:', i, '\n')
>>>> + print(system.time(source('testdump', keep.source = FALSE,
>>>> + encoding='')))
>>>> + }
>>>> length: 10
>>>> user system elapsed
>>>> 0 0 0
>>>> length: 100
>>>> user system elapsed
>>>> 0 0 0
>>>> length: 1000
>>>> user system elapsed
>>>> 0 0 0
>>>> length: 10000
>>>> user system elapsed
>>>> 0.02 0.00 0.01
>>>> length: 1e+05
>>>> user system elapsed
>>>> 0.21 0.00 0.20
>>>> length: 1e+06
>>>> user system elapsed
>>>> 4.47 0.04 4.51
>>>> length: 1e+07
>>>> user system elapsed
>>>> 62.04 0.22 62.26
>>>>>
>>>>
>>>>
>>>> output for R version 3.0.2 Patched (2013-10-27 r64116):
>>>>> sessionInfo()
>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=German_Switzerland.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>> sample.vec <-
>>>> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>> 'the',
>>>> + 'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>> dmp.size <- c(10^(1:7))
>>>>> set.seed(37)
>>>>>
>>>>> for(i in dmp.size) {
>>>> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>> + dump('df0', file='testdump')
>>>> + cat('length:', i, '\n')
>>>> + print(system.time(source('testdump', keep.source = FALSE,
>>>> + encoding='')))
>>>> + }
>>>> length: 10
>>>> user system elapsed
>>>> 0 0 0
>>>> length: 100
>>>> user system elapsed
>>>> 0 0 0
>>>> length: 1000
>>>> user system elapsed
>>>> 0 0 0
>>>> length: 10000
>>>> user system elapsed
>>>> 0.01 0.00 0.01
>>>> length: 1e+05
>>>> user system elapsed
>>>> 0.36 0.06 0.42
>>>> length: 1e+06
>>>> user system elapsed
>>>> 6.02 1.86 7.88
>>>> length: 1e+07
>>>> user system elapsed
>>>> 388.63 176.42 566.41
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-
>> in-
>>>> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list