[R] complexity of operations in R

Jan van der Laan rhelp at eoos.dds.nl
Thu Jul 19 22:00:17 CEST 2012


When the length of the end result is not known, doubling the length of 
the list is also much faster than increasing the size of the list with 
single items.

f <- function(n, preallocate) {
     v <- if(preallocate) vector("list",n) else list() ;
     for(i in seq_len(n)) {
        v[[i]] <- i
     }
     v
}

g <- function(n) {
     N <- 1000
     v <- vector("list", N)
     for(i in seq_len(n)) {
        if (i > N) {
            N <- 2 * N
            length(v) <- N
        }
        v[[i]] <- i
     }
     v[1:i]
}


 > system.time(f(5E4, TRUE))
    user  system elapsed
   0.968   0.000   0.975
 > system.time(f(5E4, FALSE))
    user  system elapsed
  52.611   0.136  54.197
 > system.time(g(5E4))
    user  system elapsed
   1.388   0.008   1.424


What causes these differences? I can imagine that the time needed for 
memory allocations play a role: multiple small allocations will be 
smaller than one large allocation. But that doesn't explain the 
quadratic growth in time. I would expect that to be linear. When doing 
v[[i]] <- i the list isn't copied, right?

Jan



On 07/19/2012 06:21 PM, William Dunlap wrote:
> Preallocation of lists does speed things up.  The following shows
> time quadratic in size when there is no preallocation and linear
> growth when there is, for size in the c. 10^4 to 10^6 region:
>> f <- function(n, preallocate) { v <- if(preallocate)vector("list",n) else list() ; for(i in seq_len(n)) v[[i]] <- i ; v }
>> identical(f(17,pre=TRUE), f(17,pre=FALSE))
> [1] TRUE
>> system.time(f(n=1e4, preallocate=FALSE))
>     user  system elapsed
>    0.324   0.000   0.326
>> system.time(f(n=2e4, preallocate=FALSE)) # 2x n, 4x time
>     user  system elapsed
>    1.316   0.012   1.329
>> system.time(f(n=4e4, preallocate=FALSE)) # ditto
>     user  system elapsed
>    5.720   0.028   5.754
>>
>> system.time(f(n=1e4, preallocate=TRUE))
>     user  system elapsed
>    0.016   0.000   0.017
>> system.time(f(n=2e4, preallocate=TRUE)) # 2x n, 2x time
>     user  system elapsed
>    0.032   0.004   0.036
>> system.time(f(n=4e4, preallocate=TRUE)) # ditto
>     user  system elapsed
>    0.068   0.000   0.069
>>
>> system.time(f(n=4e5, preallocate=TRUE)) # 10x n, 10x time
>     user  system elapsed
>    0.688   0.000   0.688
>
> Above 10^6 there is some superlinearity
>> system.time(f(n=4e6, preallocate=TRUE)) # 10x n, 20x time
>     user  system elapsed
>   11.125   0.052  11.181
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf Of Bert Gunter
>> Sent: Thursday, July 19, 2012 9:11 AM
>> To: Hadley Wickham
>> Cc: r-help at r-project.org
>> Subject: Re: [R] complexity of operations in R
>>
>> Hadley et. al:
>>
>> Indeed. And using a loop is a poor way to do it anyway.
>>
>> v <- as.list(rep(FALSE,dotot))
>>
>> is way faster.
>>
>> -- Bert
>>
>> On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham <hadley at rice.edu> wrote:
>>> On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:
>>>> Johan,
>>>>
>>>> Your 'list' and 'array doubling' code can be written much more efficient.
>>>>
>>>> The following function is faster than your g and easier to read:
>>>>
>>>> g2 <- function(dotot) {
>>>>    v <- list()
>>>>    for (i in seq_len(dotot)) {
>>>>      v[[i]] <- FALSE
>>>>    }
>>>> }
>>>
>>> Except that you don't need to pre-allocate lists...
>>>
>>> Hadley
>>>
>>> --
>>> Assistant Professor / Dobelman Family Junior Chair
>>> Department of Statistics / Rice University
>>> http://had.co.nz/
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
>> biostatistics/pdb-ncb-home.htm
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list