[R] Memory management

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Apr 11 15:17:04 CEST 2007


Start with the 'R Internals' manual.  R has 'call by value' semantics, but 
lazy copying (the idea is to make a copy only when an object is changed 
and there are still references to the original version, but that idea is 
partially implemented).

'which strategy is better at which situation' is difficult.  'S 
Programming' (see the FAQ) has a lot of accumulated wisdom that has 
largely been superseded by changes to S and R.  We keep making changes to 
reduce copying (another slew of changes is planned for 2.6.0), so this is 
something that is very hard to keep up with.

We can tell you that some things are likely to be bad, and 'S Programming' 
is a good place to find out about most of those.

On Wed, 11 Apr 2007, yoooooo wrote:

>
> I guess I have more reading to do.... Are there any website that I can read
> up on memory management, or specifically what happen when we 'pass in'
> variables, which strategy is better at which situation?
>
> Thanks~
> - yoooooooo
>
>
> Prof Brian Ripley wrote:
>>
>> On Tue, 10 Apr 2007, yoooooo wrote:
>>
>>>
>>> Hi all, I'm just curious how memory management works in R... I need to
>>> run an
>>> optimization that keeps calling the same function with a large set of
>>> parameters... so then I start to wonder if it's better if I attach the
>>> variables first vs passing them in (coz that involves a lot of copying..
>>> )
>>
>> Your paranethetical comment is wrong: no copying is needed to 'pass in' a
>> variable.
>>
>>> Thus, I do this
>>> fn3 <- function(x, y, z, a, b, c){ sum(x, y, z, a, b, c) }
>>> fn4 <- function(){ sum(x, y, z, a, b, c) }
>>>
>>> rdn <- rep(1.1, times=1e8)
>>> r <- proc.time()
>>> for (i in 1:5)
>>>  fn3(rdn, rdn, rdn, rdn, rdn, rdn)
>>> time1 <- proc.time() - r
>>> print(time1)
>>>
>>> lt <- list(x = rdn, y = rdn, z = rdn, a = rdn, b = rdn, c = rdn)
>>> attach(lt)
>>> r <- proc.time()
>>> for (i in 1:5)
>>>  fn4()
>>> time2 <- proc.time() - r
>>> print(time2)
>>> detach("lt")
>>>
>>> The output is
>>> [1] 25.691  0.003 25.735  0.000  0.000
>>> [1] 25.822  0.005 25.860  0.000  0.000
>>>
>>> Turns out attaching takes longer to run.. which is counter intuitive
>>> (unless
>>> the search to the pos=2 envir takes long time as well) Do you guys know
>>> why
>>> this is the case?
>>
>> I would not trust timing differences of that nature: they often depend on
>> the state of the system, and in particular of the garbage collector.
>> You should be using system.time() for that reason: it calls the garbage
>> collector immediately before timing.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list