[R] memory use of copies

Philippe GROSJEAN Philippe.GROSJEAN at umons.ac.be
Wed Jan 29 10:29:53 CET 2014


For the last case with the list:

> x <- 1:2; y = list(x)[rep(1, 4)]
> .Internal(inspect(y))
@102bbe090 19 VECSXP g0c3 [MARK,NAM(2)] (len=4, tl=0)
  @106119628 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2
  @106119628 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2
  @106119628 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2
  @106119628 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2
> y[[1]][1] <- 2L # everybody copied
> .Internal(inspect(y))
@102fca698 19 VECSXP g0c3 [NAM(1)] (len=4, tl=0)
  @1061196b8 13 INTSXP g0c1 [] (len=2, tl=0) 2,2
  @106119688 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
  @106119658 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
  @106119718 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
> y1 <- y[[1]]; y1[1] <- 3L; y[[1]] <- y1 # only one copied
> .Internal(inspect(y))
@102fca698 19 VECSXP g0c3 [MARK,NAM(1)] (len=4, tl=0)
  @10610b7a8 13 INTSXP g0c1 [MARK] (len=2, tl=0) 3,2
  @106119688 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2
  @106119658 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2
  @106119718 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2

Assignment to "double subset" of a list seems to trigger full copy of the list, but `[[<-` alone appears smart enough to avoid copying the other elements of the list.
Best,

Philippe
..............................................<°}))><........
 ) ) ) ) )
( ( ( ( (    Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (    Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons University, Belgium
( ( ( ( (
..............................................................

On 29 Jan 2014, at 00:53, Ross Boylan <ross at biostat.ucsf.edu> wrote:

> Thank you for a very thorough analysis.  It seems whether or not an
> operation makes a full copy really depends on the specific operation,
> and that it is not safe to assume that because I know something is
> unchanged there will be no copy.  For example, in your last case only
> one element of a list was modified, but all the list elements got new
> memory.
> 
> BTW, one reason I got into this, aside from wanting to save memory, is
> that I found my code was spending a lot of time in areas that probably
> involved getting new memory.  So it mattered for speed too.
> 
> Ross
> 
> On Mon, 2014-01-27 at 06:33 -0800, Martin Morgan wrote:
>> Hi Ross --
>> 
>> On 01/23/2014 05:53 PM, Ross Boylan wrote:
>>> [Apologies if a duplicate; we are having mail problems.]
>>> 
>>> I am trying to understand the circumstances under which R makes a copy
>>> of an object, as opposed to simply referring to it.  I'm talking about
>>> what goes on under the hood, not the user semantics.  I'm doing things
>>> that take a lot of memory, and am trying to minimize my use.
>>> 
>>> I thought that R was clever so that copies were created lazily.  For
>>> example, if a is matrix, then
>>> b <- a
>>> b & a referred to to the same object underneath, so that a complete
>>> duplicate (deep copy) wasn't made until it was necessary, e.g.,
>>> b[3, 1] <- 4
>>> would duplicate the contents of a to b, and then overwrite them.
>> 
>> Compiling your R with --enable-memory-profiling gives access to the tracemem() 
>> function, showing that your understanding above is correct
>> 
>>> b = matrix(0, 3, 2)
>>> tracemem(b)
>> [1] "<0x7054020>"
>>> a = b        ## no copy
>>> b[3, 1] = 2  ## copy
>> tracemem[0x7054020 -> 0x7053fc8]:
>>> b = matrix(0, 3, 2)
>>> tracemem(b)
>>> tracemem(b)
>> [1] "<0x680e258>"
>>> b[3, 1] = 2  ## no copy
>>> 
>> 
>> The same is apparent using .Internal(inspect()), where the first information 
>> @7053ec0 is the address of the data. The other relevant part is the 'NAM()' 
>> field, which indicates whether there are 0, 1 or (have been) at least 2 symbols 
>> referring to the data. NAM() increments from 1 (no duplication on modify 
>> required) on original creation to 2 when a = b (duplicate on modify)
>> 
>>> b = matrix(0, 3, 2)
>>> .Internal(inspect(b))
>> @7053ec0 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,0,0,0,...
>> ATTRIB:
>>   @7057528 02 LISTSXP g0c0 []
>>     TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
>>     @7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2
>>> b[3, 1] = 2
>>> .Internal(inspect(b))
>> @7053ec0 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,2,0,0,...
>> ATTRIB:
>>   @7057528 02 LISTSXP g0c0 []
>>     TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
>>     @7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2
>>> a = b
>>> .Internal(inspect(b))      ## data address unchanced
>> @7053ec0 14 REALSXP g0c4 [NAM(2),ATT] (len=6, tl=0) 0,0,0,0,0,...
>> ATTRIB:
>>   @7057528 02 LISTSXP g0c0 []
>>     TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
>>     @7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2
>>> b[3, 1] = 2
>>> .Internal(inspect(b))      ## data address changed
>> @7232910 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,2,0,0,...
>> ATTRIB:
>>   @7239d28 02 LISTSXP g0c0 []
>>     TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
>>     @7237b48 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2
>> 
>> 
>>> 
>>> The following log, from R 3.0.1, does not seem to act that way; I get
>>> the same amount of memory used whether I copy the same object repeatedly
>>> or create new objects of the same size.
>>> 
>>> Can anyone explain what is going on?  Am I just wrong that copies are
>>> initially shallow?  Or perhaps that behavior only applies for function
>>> arguments?  Or doesn't apply for class slots or reference class
>>> variables?
>>> 
>>>> foo <- setRefClass("foo", fields=list(x="ANY"))
>>>> bar <- setClass("bar", slots=c("x"))
>> 
>> using the approach above, we can see that creating an S4 or reference object in 
>> the way you've indicated (validity checks or other initialization might change 
>> this) does not copy the data although it is marked for duplication
>> 
>>> x = 1:2; .Internal(inspect(x))
>> @7553868 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2
>>> .Internal(inspect(foo(x=x)$x))
>> @7553868 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
>>> .Internal(inspect(bar(x=x)@x))
>> @7553868 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
>> 
>> On the other hand, lapply is creating copies
>> 
>>> x = 1:2; .Internal(inspect(x))
>> @757b5a8 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2
>>> .Internal(inspect(lapply(1:2, function(i) x)))
>> @7551f88 19 VECSXP g0c2 [] (len=2, tl=0)
>>   @757b428 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
>>   @757b3f8 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
>> 
>> One can construct a list without copies
>> 
>>> x = 1:2; .Internal(inspect(x))
>> @7677c18 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2
>>> .Internal(inspect(list(x)[rep(1, 2)]))
>> @767b080 19 VECSXP g0c2 [NAM(2)] (len=2, tl=0)
>>   @7677c18 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
>>   @7677c18 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
>> 
>> but that (creating a list of identical elements) doesn't seem to be a likely 
>> real-world scenario and the gain is transient
>> 
>>> x = 1:2; y = list(x)[rep(1, 4)]
>>> .Internal(inspect(y))
>> @507bef8 19 VECSXP g0c3 [NAM(2)] (len=4, tl=0)
>>   @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
>>   @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
>>   @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
>>   @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
>>> y[[1]][1] = 2L                  ## everybody copied
>>> .Internal(inspect(y))
>> @507bf40 19 VECSXP g0c3 [NAM(1)] (len=4, tl=0)
>>   @51502c8 13 INTSXP g0c1 [] (len=2, tl=0) 2,2
>>   @51502f8 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
>>   @5150328 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
>>   @5150358 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
>> 
>> 
>> Probably it is more helpful to think of reducing the number of times an object 
>> is _modified_, e.g., representing data as vectors and doing vectorized updates.
>> 
>> Martin
>> 
>>>> mycoef <- list(a=matrix(rnorm(200000), ncol=2000), b=array(rnorm(200000),
>>> dim=c(4, 5, 10000)))
>>>> gc()
>>>              used   (Mb) gc trigger    (Mb)   max used    (Mb)
>>>  Ncells   2650747  141.6    4170209   222.8    4170209   222.8
>>>  Vcells 799751724 6101.7 1711485496 13057.6 1711485493 13057.6
>>>> a <- lapply(1:100, function(i) bar(x=mycoef))   # create 100 objects that
>>> contain copies
>>>> gc()
>>>              used   (Mb) gc trigger    (Mb)   max used    (Mb)
>>>  Ncells   2652156  141.7    4170209   222.8    4170209   222.8
>>>  Vcells 839752640 6406.9 1711485496 13057.6 1711485493 13057.6
>>> # +305 Mb
>>>> b <- lapply(1:100, function(i) foo(x=mycoef))   # same with a reference class
>>>> gc()
>>>              used   (Mb) gc trigger    (Mb)   max used    (Mb)
>>>  Ncells   2654761  141.8    4170209   222.8    4170209   222.8
>>>  Vcells 879756752 6712.1 1711485496 13057.6 1711485493 13057.6
>>> # also + 305 Mb
>>>> rm("a", "b")
>>>> gc()
>>>              used   (Mb) gc trigger    (Mb)   max used    (Mb)
>>>  Ncells   2650660  141.6    4170209   222.8    4170209   222.8
>>>  Vcells 799751664 6101.7 1711485496 13057.6 1711485493 13057.6
>>> # write to "copy" to see if it uses more memory
>>>> a <- lapply(1:100, function(i) {r <- bar(x=mycoef); r at x$a[5, 10] <- 33; r} )
>>>> gc()
>>>              used   (Mb) gc trigger    (Mb)   max used    (Mb)
>>>  Ncells   2652174  141.7    4170209   222.8    4170209   222.8
>>>  Vcells 839752684 6406.9 1711485496 13057.6 1711485493 13057.6
>>> # also + 305 Mb
>>>> rm("a", "b")
>>>  Warning message:
>>>  In rm("a", "b") : object 'b' not found
>>>> gc()
>>>              used   (Mb) gc trigger    (Mb)   max used    (Mb)
>>>  Ncells   2650680  141.6    4170209   222.8    4170209   222.8
>>>  Vcells 799751684 6101.7 1711485496 13057.6 1711485493 13057.6
>>> # now create completely distinct objects
>>>> a <- lapply(1:100, function(i) {acoef <- list(a=matrix(rnorm(200000),
>>> ncol=2000), b=array(rnorm(200000), dim=c(4, 5, 10000)))
>>> !+                                 bar(x=acoef)})
>>>> gc()
>>>              used   (Mb) gc trigger    (Mb)   max used    (Mb)
>>>  Ncells   2652191  141.7    4170209   222.8    4170209   222.8
>>>  Vcells 839752699 6406.9 1711485496 13057.6 1711485493 13057.6
>>> # + 305 Mb
>>> 
>>> Thanks.
>>> Ross Boylan
>>> 
>>> P.S. I also tried posting this from a google-managed email account, and have got
>>> back two messages like this:
>>> Mail Delivery Subsystem mailer-daemon at googlemail.com
>>> 
>>> 
>>> 5:22 PM (28 minutes ago)
>>> 
>>> 
>>> to me
>>> 
>>> This is an automatically generated Delivery Status Notification
>>> 
>>> THIS IS A WARNING MESSAGE ONLY.
>>> 
>>> YOU DO NOT NEED TO RESEND YOUR MESSAGE.
>>> 
>>> Delivery to the following recipient has been delayed:
>>> 
>>> r-help at r.project.org <mailto:r-help at r.project.org>
>>> 
>>> Message will be retried for 1 more day(s)
>>> 
>>> Technical details of temporary failure:
>>> The recipient server did not accept our requests to connect. Learn more at
>>> http://support.google.com/mail/bin/answer.py?answer=7720
>>> <http://support.google.com/mail/bin/answer.py?answer=7720>
>>> [(0) r.project.org <http://r.project.org>
>>> . [206.188.192.100]:25: Connection refused]
>>> 
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list