[Rd] number of copies

Mon Oct 3 17:38:54 CEST 2011

Terry,

On Oct 3, 2011, at 10:32 AM, Terry Therneau wrote:

> I'm looking at memory efficiency for some of the survival code.  The
> following fragment appears in coxph.fit
>    coxfit <- .C("coxfit2", iter=as.integer(maxiter),
>                   as.integer(n),
>                   as.integer(nvar), stime,
>                   sstat,
>                   x= x[sorted,] ,
> 	      ...
> 
> Does this make a second copy of x to pass to the routine (my
> expectation) or will I end up with 3: x and x[sorted,] in the local
> frame of reference, and another due to dup=TRUE?
> 

I'm not sure I'm counting your copies right, but I'd say the latter (although the sorting cannot be technically called a copy ;)).
There are 4 distinct, separate objects:
x -> x[sorted,] -> double-array to pass to C -> result vector
If you care about speed, you should definitely use .Call().

Note for debugging: tracemem is actually smart and flags the intermediate memory object created inside .C for passing as a proper duplication even though it is not a real one (no duplicate() involved) since the object is not an R object at all. It then also flags the allocation of the result object as a duplication from the intermediate object, so in summary tracemem gives you the true number of copies.

As far as I remember .C is a legacy left-over from the ancient Fortran interface in original S (it's not really a C interface at all - it is a Fortran interface that happens to not care about source language and C can be used to create Fortran-looking object code) so unless one needs Fortran, one should not be using .C ;). It can be used, but should not be used for anything but maybe didactic purposes IMHO.

Cheers,
Simon