[Rd] A memory management question

dhinds@sonic.net dhinds at sonic.net
Mon Sep 5 19:18:10 CEST 2005


Luke Tierney <luke at stat.uiowa.edu> wrote:

> It might or might not work now but is not guaranteed to do so reliably
> in the future.  Seeing the risks of leaving SETLENGTH exposed, it is
> very likely that SETLENGTH will be removed from the sources after the
> 2.2.0 release.

> If you provide your own methods to read and write the external pointer
> then you don' need this; this is safer than relying on undocumented
> behavior of [ and [<- in any case.  You also then don't need to use
> R_PreserveObject unless you really need to use it from the C level
> outside of a context where an R reference exists.

I'm not sure I follow this.  Maybe I should explain the context for
the problem.

textConnection("xyz", "w") creates a connection, the output of which
is deposited in a char vector named "xyz", which is updated line by
line as output is sent to the connection.  The current code maintains
a pointer to "xyz" in the form of an unprotected SEXP.  Hence if the
user does rm(xyz), bad things happen.  A small bug, I admit.

I think the best fix is to use a protected reference to the result
vector.  I think this is safe and doesn't rely on any abuse of the
interfaces.

There's also a performance issue, that the result is updated after
every line of output, resulting in a vast amount of copying if a large
result is accumulated.  This is the part that could be fixed by using
SETLENGTH to manage the length of the protected result vector.

I'm not sure what you mean by undocumented behavior of [ and [<-.  I
think all I'm relying on is that as long as an outstanding reference
to the result vector exists, that R has to make sure the reference
remains valid, and hence can't change the memory allocation of the
result vector in any way.  I don't care what else happens to the
contents of the vector, as long as I get to control when it is
released.  It is ok with me if the user modifies the result vector
in-place, since my reference stays valid.  So I don't actually care
how [ and [<- work.

I think the only undocumented thing I'm relying on, is that the memory
manager doesn't pay attention to the LENGTH of objects that it isn't
actively doing anything to.  Currently, it actually only uses LENGTH
in one spot: for updating R_LargeVallocSize when a large vector is
released.  The true allocation sizes for individual objects are always
kept in another place (either by malloc, or in the node class of the
object).

It seems like in this limited usage, SETLENGTH does represent a useful
feature, by permitting safe over-allocation of a protected object, and
might be worth preserving (and documenting) for that purpose.  

Of course, the real problem here is the semantics of textConnection(),
which make life much more difficult and can't be changed because they
are specified outside of R.

-- Dave



More information about the R-devel mailing list