[Rd] memory management

Yuri D'Elia wavexx at users.sf.net
Thu Aug 13 18:09:27 CEST 2009


Hi everyone. In response to my previous message (Memory management
issues), I've come up with the following patch against R 2.9.1.

To summarize the situation:

- We're hitting the memory barrier in our lab when running concurrent R
  processes due to the large datasets we use.
- We don't want to copy data back-and-forth between our R extension
  and R in order to reduce overall memory usage.

There were some very useful suggestions in the list, but nothing
optimal.

With this patch, I export two new functions from memory.c called
R_RegisterObject and R_UnregisterObject which simply allow to bypass
allocVector. They accept a SEXP node (which needs to be allocated and
initialized externally), protect it from collection by calling
R_ProtectObject, and snap it temporarily into the GC oldest and largest
heap generation until the object is unregistered.

Since these functions require knowledge of the inner workings of the
SEXP object, they are exported only if USE_RINTERNALS is defined.

By using these two functions, we developed a simple R extension which
allows to load data.frames directly from COW memory pages by
using mmap(), resulting in significant memory sharing between
various processes using the same datasets (and instantaneous load
times). This allowed us to program most of our code directly in R
instead or resorting to C for performance or memory constraints.

Could someone review the attached patch and spot any potential
problems? Is a change like this likely to be integrated into the R
sources? We would like to release our current R extension for anyone
to use.

Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: r-extgc.diff
Type: text/x-patch
Size: 1743 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20090813/e1a0ca1c/attachment.bin>


More information about the R-devel mailing list