[R] save/load doubles memory [oops]

luke-tierney at uiowa.edu luke-tierney at uiowa.edu
Tue Sep 17 21:39:04 CEST 2013


At this point R's serialization format only preserves sharing of
environments; any other sharing is lost. Changing this will require an
extensive rewrite of serialization. It would be useful to have this,
especially as we are trying to increase sharing/decrease copying, but
it isn't likely any time soon.

Best,

luke

On Tue, 17 Sep 2013, Ross Boylan wrote:

> On Tue, 2013-09-17 at 12:06 -0700, Ross Boylan wrote:
>> Saving and loading data is roughly doubling memory use.  I'm trying to
>> understand and correct the problem.
> Apparently I had the process memories mixed up: R1 below was the one
> with 4G and R2 with 2G.  So there's less of a mystery.  However...
>>
>> R1 was an R process using just over 2G of memory.
>> I did save(r3b, r4, sflist, file="r4.rdata")
>> and then, in a new process R2,
>> load(file="r4.rdata")
>>
>> R2 used just under 4G of memory, i.e., almost double the original
>> process.  The r4.rdata file was just under 2G, which seemed like very
>> little compression.
>>
>> r4 was created by
>> r4 <- sflist2stanfit(sflist)
>>
>> I presume that r4 and sflist shared most of their memory.
>> The save() apparently lost the information that the memory was shared,
>> doubling memory use.
> Still wondering if this is going on.
>>
>> R 2.15.1, 64 bit on linux.
>>
>> First, does my diagnosis sound right?  The reports of memory use in R2
>> are quite a bit lower than the process footprint; is that normal?
>>> gc()  # after loading data
>>             used   (Mb) gc trigger   (Mb)  max used   (Mb)
>> Ncells   1988691  106.3    3094291  165.3   2432643  130.0
>> Vcells 266976864 2036.9  282174979 2152.9 268661172 2049.8
>>> rm("r4")
>>> gc()
>>             used   (Mb) gc trigger   (Mb)  max used   (Mb)
>> Ncells   1949626  104.2    3094291  165.3   2432643  130.0
>> Vcells 190689777 1454.9  282174979 2152.9 268661172 2049.8
>>> r4 <- sflist2stanfit(sflist)
>>> gc()
>>             used   (Mb) gc trigger   (Mb)  max used   (Mb)
>> Ncells   1970497  105.3    3094291  165.3   2432643  130.0
>> Vcells 228827252 1745.9  296363727 2261.1 268661172 2049.8
>>>
> It seems the recreated r4 used about 300M less memory than the one read
> in from disk.  This suggests that some of the sharing was lost in the
> save/load  process.
>
>>
>> Even weirder, R1 reports memory use well beyond the memory I show the
>> process using (2.2G)
> Not a mystery after getting the right processes.  Actually, I'm a little
> surprised the process memory is less than the max used memory; I thought
> giving back memory was not possible on Linux.
>>> gc()
>>              used   (Mb) gc trigger   (Mb)  max used   (Mb)
>>  Ncells   3640941  194.5    5543382  296.1   5543382  296.1
>>  Vcells 418720281 3194.6  553125025 4220.1 526708090 4018.5
>>
>>
>> Second, what can I do to avoid the problem?
>
> Now a more modest problem, though still a problem.
>>
>> I guess in this case I could not save r4 and recreate it, but is there a
>> more general solution?
>>
>> If I did myboth <- list(r4, sflist) and
>> save(myboth, file="myfile")
>> would that be enough to keep the objects together?  Judging from the
>> size of the file, it seems not.
>>
>> Even if the myboth trick worked it seems like a kludge.
>>
>> Ross Boylan
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-help mailing list