[R] Question on memory allocation & loop

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jun 29 09:21:03 CEST 2006


On Thu, 29 Jun 2006, Manoj wrote:

> Hello All,
>      I am trying to work on writing the following piece of (pseudo)
> code in an optimal fashion:
>
> ----------------------------------------------------
> # Two data frames with some data
>
> a = data.frame(somedata)
> b = data.frame(somedata)
>
> for(i in 1:nrow(dt) {
>  # Merge dates for a given date into a new data frame
>   c = merge(a[a$dt==dt[i],),b[b$dt == dt[i],], by=c(some column));
> }

Note that only the last iteration of that loop is actually needed.

What are you really trying to do, and why are you worrying about memory? 
E.g. merge() in R-devel is a lot more efficient for some operations, 
including perhaps your example.

> ----------------------------------------------------
>
>
> Now, my understanding is that the data frame c in the above code is
> malloc'ed in every count of the loop.  Is that assumption correct?

No.  Here 'c' is just a symbol, and assignment (please use <- in public 
code, it is easier to read) binds the symbol to the data frame returned by 
merge().  So the allocation (not 'malloc' necessarily) is going on inside 
merge(). Also, 'c' is a system object, so you are confusing people by 
using its name for your own object.

When you assign to 'c' you change the binding to a different already 
allocated object.  Eventually garbage collection will recover (to R) the 
memory allocated to objects which are no longer bound to symbols.

I am not aware of any account which describes in detail how R works at 
this level, and end users do not need to know it.  (It is also the case 
that R maintains a number of illusions and internally may not do what it 
appears to do.)

>
> Is the following attempt a better way of doing things?
>
> ----------------------------------------------------
> a = data.frame(somedata)
> b = data.frame(somedata)
>
> # Pre-allocate data frame c
>
> c = data.frame(for some size);
>
> for(i in 1:nrow(dt) {
>  # Merge dates for a given date into a new data frame
>   # and copy the result into c
>
>  copy(c, merge(a[a$dt==dt[i],),b[b$dt == dt[i],], by=c(some column));
>
> }
> ----------------------------------------------------
>
> Now the question is, How can I copy the merged data into my
> pre-allocated data frame c ? I tried rbind/cbind but they are pretty
> fuzzy about having the right names and dimension hence it fails.
>
> Any help would be greatly appreciated!
>
> Thanks.
>
> Manoj
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list