[R] Pre-allocation of matrices is LESS efficient?

Douglas Bates bates at stat.wisc.edu
Thu Feb 17 19:25:12 CET 2011


On Thu, Feb 17, 2011 at 10:02 AM, Alex F. Bokov
<ahupxot02 at sneakemail.com> wrote:
> Motivation: during each iteration, my code needs to collect tabular data (and use it only during that iteration), but the rows of data may vary. I thought I would speed it up by preinitializing the matrix that collects the data with zeros to what I know to be the maximum number of rows. I was surprised by what I found...
>
> # set up (not the puzzling part)
> x<-matrix(runif(20),nrow=4); y<-matrix(0,nrow=12,ncol=5); foo<-c();

There is no purpose in initializing foo here.  Your assignment in the
second version overwrites any assignment here.

> # this is what surprises me... what the?
>> system.time(for(i in 1:100000){n<-sample(1:4,1);y[1:n,]<-x[1:n,];});
>   user  system elapsed
>  1.510   0.000   1.514

This version performs extraction from x and assignment into a
submatrix of y.  The second version performs only the extraction and
assignment to a name in the evaluation environment, which is a much
faster operation.

>> system.time(for(i in 1:100000){n<-sample(1:4,1);foo<-x[1:n,];});
>   user  system elapsed
>  1.090   0.000   1.085
>
> These results are very repeatable. So, if I'm interpreting them correctly, dynamically allocating 'foo' each time to whatever the current output size is runs faster than writing to a subset of a preallocated 'y'? How is that possible?
>
> And, more generally, I'm sure other people have encountered this type of situation. Am I reinventing the wheel? Is there a best practice for storing temporary loop-specific data?
>
> Thanks.
>
> PS:  By the way, though I cannot write to foo[,] because the size is different each time, I tried writing to foo[] and the runtime was worse than either of the above examples.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list