[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Heinz Tuechler tuechler at gmx.at
Thu Oct 31 11:25:18 CET 2013


on/am 31.10.2013 09:12, Prof Brian Ripley wrote/hat geschrieben:
> On 30/10/2013 21:15, William Dunlap wrote:
>> I have to defer to others for policy declarations like how long
>> the current format used by load and save should be readable.
>
> You could also ask how long R will last ....
>
> R can still read (but not write) save() formats used in the 1990's.  We
> would expect R to be able to read saves since R 1.0.0 for as long as R
> exists.  And as R is Open Source, you would be able to compile it and
> dump the objects you want for as long as suitable compilers and OSes
> exist ....  And of course R is not the only application which will read
> the format.
>
> There is no guarantee that source() will be able to parse dumps from
> earlier versions of R, and that has not always been true.
>
> People commenting on parse() speed should note the NEWS for R-devel:
>
>      • The parser has been modified to use less memory.
>
>
Thank you for the hint.
It appears to me that source() in R-devel performs at about the same 
speed as in R 2.15.2.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>>> -----Original Message-----
>>> From: Heinz Tuechler [mailto:tuechler at gmx.at]
>>> Sent: Wednesday, October 30, 2013 1:43 PM
>>> To: William Dunlap
>>> Cc: Carl Witthoft; r-help at r-project.org
>>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R
>>> 3.0.2 ?
>>>
>>> Best thanks for confirming my impression. I use dump for storing large
>>> data.frames with a number of attributes for each variable. save/load is
>>> much faster, but I am unsure, if such files will be readable by R
>>> versions years later.
>>> What format/functions would you suggest for data storage/transfer
>>> between different (future) R versions?
>>>
>>> best regards,
>>> Heinz
>>>
>>> on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
>>>> I see a big 2.15.2/3.0.2 speed difference in parse() (which is used
>>>> by source())
>>>> when it is parsing long vectors of numeric data.  dump/source has
>>>> never been an
>>> efficient
>>>> way of transferring data between different R session, but it is much
>>>> worse
>>>> now for long vectors.   In 2.15.2 doubling the size of the vector
>>>> (of lengths
>>>> in the range 10^4 to 10^7) makes the time to parse go up by a factor
>>>> of c. 2.1.
>>>> In 3.0.2 that factor is more like 4.4.
>>>>
>>>>          n elapsed-2.15.2 elapsed-3.0.2
>>>>       2048          0.003         0.018
>>>>       4096          0.006         0.065
>>>>       8192          0.013         0.254
>>>>      16384          0.025         1.067
>>>>      32768          0.050         4.114
>>>>      65536          0.100        16.236
>>>>     131072          0.219        66.013
>>>>     262144          0.808       291.883
>>>>     524288          2.022      1285.265
>>>>    1048576          4.918            NA
>>>>    2097152          9.857            NA
>>>>    4194304         22.916            NA
>>>>    8388608         49.671            NA
>>>> 16777216        101.042            NA
>>>> 33554432        512.719            NA
>>>>
>>>> I tried this with 64-bit R on a Linux box.  The NA's represent sizes
>>>> that did not
>>>> finish while I was at a 1 1/2 hour dentist's apppointment.  The
>>>> timing function
>>>> was:
>>>>     test <- function(n = 2^(11:25))
>>>>     {
>>>>         tf <- tempfile()
>>>>         on.exit(unlink(tf))
>>>>         t(sapply(n, function(n){
>>>>             dput(log(seq_len(n)), file=tf)
>>>>             print(c(n=n, system.time(parse(file=tf))[1:3]))
>>>>         }))
>>>>     }
>>>>
>>>> Bill Dunlap
>>>> Spotfire, TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org
>>>>> [mailto:r-help-bounces at r-project.org] On
>>> Behalf
>>>>> Of Carl Witthoft
>>>>> Sent: Wednesday, October 30, 2013 5:29 AM
>>>>> To: r-help at r-project.org
>>>>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R
>>>>> 3.0.2 ?
>>>>>
>>>>> Did you run the identical code on the identical machine, and did
>>>>> you verify
>>>>> there were no other tasks running which might have limited the RAM
>>>>> available
>>>>> to R?  And equally important, did you run these tests in the
>>>>> reverse order
>>>>> (in case R was storing large objects from the first run, thus
>>>>> chewing up
>>>>> RAM)?
>>>>>
>>>>>
>>>>>
>>>>> Dear All,
>>>>>
>>>>> is it known that source works much faster in  R 2.15.2 than in R
>>>>> 3.0.2 ?
>>>>> In the example below I observe e.g. for a data.frame with 10^7 rows
>>>>> the
>>>>> following timings:
>>>>>
>>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>>> length: 1e+07
>>>>>       user  system elapsed
>>>>>      62.04    0.22   62.26
>>>>>
>>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>>> length: 1e+07
>>>>>       user  system elapsed
>>>>>     388.63  176.42  566.41
>>>>>
>>>>> Is there a way to speed R version 3.0.2 up to the performance of R
>>>>> version 2.15.2?
>>>>>
>>>>> best regards,
>>>>>
>>>>> Heinz Tüchler
>>>>>
>>>>>
>>>>> example:
>>>>> sessionInfo()
>>>>> sample.vec <-
>>>>>      c('source', 'causes', 'R', 'to', 'accept', 'its', 'input',
>>>>> 'from', 'the',
>>>>>        'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>> dmp.size <- c(10^(1:7))
>>>>> set.seed(37)
>>>>>
>>>>> for(i in dmp.size) {
>>>>>      df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>>      dump('df0', file='testdump')
>>>>>      cat('length:', i, '\n')
>>>>>      print(system.time(source('testdump', keep.source = FALSE,
>>>>>                               encoding='')))
>>>>> }
>>>>>
>>>>> output for R version 2.15.2 Patched (2012-11-29 r61184):
>>>>>> sessionInfo()
>>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=German_Switzerland.1252
>>>>> LC_CTYPE=German_Switzerland.1252
>>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=German_Switzerland.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>> sample.vec <-
>>>>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>>> 'the',
>>>>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>>> dmp.size <- c(10^(1:7))
>>>>>> set.seed(37)
>>>>>>
>>>>>> for(i in dmp.size) {
>>>>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>> +   dump('df0', file='testdump')
>>>>> +   cat('length:', i, '\n')
>>>>> +   print(system.time(source('testdump', keep.source = FALSE,
>>>>> +                            encoding='')))
>>>>> + }
>>>>> length: 10
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 100
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 1000
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 10000
>>>>>       user  system elapsed
>>>>>       0.02    0.00    0.01
>>>>> length: 1e+05
>>>>>       user  system elapsed
>>>>>       0.21    0.00    0.20
>>>>> length: 1e+06
>>>>>       user  system elapsed
>>>>>       4.47    0.04    4.51
>>>>> length: 1e+07
>>>>>       user  system elapsed
>>>>>      62.04    0.22   62.26
>>>>>>
>>>>>
>>>>>
>>>>> output for R version 3.0.2 Patched (2013-10-27 r64116):
>>>>>> sessionInfo()
>>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=German_Switzerland.1252
>>>>> LC_CTYPE=German_Switzerland.1252
>>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=German_Switzerland.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>> sample.vec <-
>>>>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>>> 'the',
>>>>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>>> dmp.size <- c(10^(1:7))
>>>>>> set.seed(37)
>>>>>>
>>>>>> for(i in dmp.size) {
>>>>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>> +   dump('df0', file='testdump')
>>>>> +   cat('length:', i, '\n')
>>>>> +   print(system.time(source('testdump', keep.source = FALSE,
>>>>> +                            encoding='')))
>>>>> + }
>>>>> length: 10
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 100
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 1000
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 10000
>>>>>       user  system elapsed
>>>>>       0.01    0.00    0.01
>>>>> length: 1e+05
>>>>>       user  system elapsed
>>>>>       0.36    0.06    0.42
>>>>> length: 1e+06
>>>>>       user  system elapsed
>>>>>       6.02    1.86    7.88
>>>>> length: 1e+07
>>>>>       user  system elapsed
>>>>>     388.63  176.42  566.41
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://r.789695.n4.nabble.com/big-speed-difference-
>>> in-
>>>>> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>



More information about the R-help mailing list