Saving memory usage -- .C(....., DUP = FALSE) danger?

Thomas Lumley thomas@biostat.washington.edu
Thu, 26 Nov 1998 14:54:31 -0800 (PST)


On Thu, 26 Nov 1998, Martin Maechler wrote:
> 
> But then I wondered ``more generally'' :
> 
> 	What exactly happens / can happen when calling, e.g.,
> 
> 		r <- .C("foo", x=x, y=as.double(y),  DUP = FALSE)
> 
> 	Will 'x' be altered after the call to .C(*)  if in C's 
> 		foo(double *x, double *y)
> 	x is altered?
> 	Will 'y' be unaltered anyway, since   "as.double(y)" produces a
> 	a different object than 'y' anway?

x will be altered, y will not.  If you want y altered then you have to
assign it to storage model "double" earlier. 

> 
> Really useful might be a comprehensive list of recommendations 
> on when  "DUP = FALSE" is useful / advisable / detestable ...
> 

Here's a start.  

DUP=FALSE is dangerous.

There are two important dangers with DUP=FALSE. The first
is that garbage collection may move the object, resulting in the pointers
pointing nowhere useful and causing hard-to-reproduce bugs.

The second is that if you pass a formal parameter of the calling function
to .C/.Fortran with DUP=FALSE I don't think it is necessarily copied. You
may be able to change not only the local variable but the variable one
level up. This will also be very hard to trace.

1) If your C/Fortran routine calls back any R function including
S_alloc/R_alloc then do not use DUP=FALSE. Don't even think about it.  
Calling almost any R function could trigger garbage collection.

2) If you don't trigger garbage collection it is safe and useful to set
DUP=FALSE if you don't change any of the variables that might be affected
	eg  .C("Cfunction",input=x,output=numeric(10))
In this case the output variable didn't exist before the call so it can't
cause trouble. If the input variable is not changed in Cfunction you are
safe.



I've commented before (but never actually done anything) that it would be
a useful intermediate step to have analogues of the Fortran 90 INTENT IN
and INTENT OUT declarations for these functions. In the example above
there is no need to copy the input back after calling Cfunction and no
need to copy the output before calling (just to allocate the space).
Something like
	.C("Cfunction",input=x,output=numeric(10),IN=c(T,F),OUT=c(F,T))
might then say to copy x and allocate uninitialised space for numeric(10),
call the function, and then copy output back again. The first component of
the result would then be NULL, saving space in the local environment as
well. These would be less efficient but less dangerous than DUP=FALSE as
you couldn't mess up R's internal structures by getting the declarations
wrong.



Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._