[Rd] [R-pkg-devel] Run garbage collector when too many open files

Jan van der Laan rhelp @ending from eoo@@dd@@nl
Tue Aug 7 15:53:12 CEST 2018


Dear Uwe,

(When replying to your message, I sent the reply to r-devel and not 
r-package-devel, as Martin Meachler suggested that this thread would be 
a better fit for r-devel.)

Thanks. In the example below I used rm() explicitly, but in general 
users wouldn't do that.

One of the reasons for the large number of file handles is that 
sometimes unnamed temporary objects are created. For example:

 > library(ldat)
 > libraty(lvec)
 >
 > a <- lvec(10, "integer")
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214753f2af0'
 > b <- as_rvec(a[1:3])
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec32146a50f383'
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214484b652c'
 > print(b)
[1] 0 0 0
 >
 >
 > gc()
CLOSEFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214484b652c'
CLOSEFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec32146a50f383'
           used (Mb) gc trigger (Mb) max used (Mb)
Ncells  796936 42.6    1442291 77.1  1168576 62.5
Vcells 1519523 11.6    4356532 33.3  4740854 36.2


For debugging, I log when files are opened and closed. The call a[1:3] 
(which creates a slice of a) creates two temporary objects [1]. These 
are only deleted when I explicitly call gc() or on some other random 
moment in time.

I hope this illustrates the problem better.


Best,
Jan


[1] One improvement would be to create less temporary files; often these 
contain only very little information that is better kept in memory. But 
that is only a partial solution.




On 07-08-18 15:24, Uwe Ligges wrote:
> Why not add functionality that allows to delete object + runs cleanup code?
> 
> Best,
> Uwe Ligges
> 
> 
> 
> On 07.08.2018 14:26, Jan van der Laan wrote:
>>
>>
>> In my package I open handles to temporary files from c++, handles to 
>> them are returned to R through vptr objects. The files are deleted 
>> then the corresponding R-object is deleted and the garbage collector 
>> runs:
>>
>> a <- lvec(10, "integer")
>> rm(a)
>>
>> Then when the garbage collector runs the file is deleted. However, on 
>> some platforms (probably with lower limits on the maximum number of 
>> file handles a process can have open), I run into the problem that the 
>> garbage collector doesn't run often enough. In this case that means 
>> that another package of mine using this package generates an error 
>> when its tests are run.
>>
>> The simplest solution is to add some calls to gc() in my tests. But a 
>> more general/automatic solution would be nice.
>>
>> I thought about something in the lines of
>>
>> robust_lvec <- function(...) {
>>    tryCatch({
>>      lvec(...)
>>    }, error = function(e) {
>>      gc()
>>      lvec(...) # duplicated code
>>    })
>> }
>>
>> e.g. try to open a file, when that fails call the garbage collector 
>> and try again. However, this introduces duplicated code (in this case 
>> only one line, but that can be more), and doesn't help if it is 
>> another function that tries to open a file.
>>
>> Is there a better solution?
>>
>> Thanks!
>>
>> Jan
>>
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel



More information about the R-devel mailing list