[Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu May 17 11:46:40 CEST 2012


On 04/05/2012 18:42, Pavel N. Krivitsky wrote:
> Dear R-devel,
>
> While tracking down some hard-to-reproduce bugs in a package I maintain,
> I stumbled on a behavior change between R 2.15.0 and the current R-devel
> (or SVN trunk).
>
> In 2.15.0 and earlier, if you passed an 0-length vector of the right
> mode (e.g., double(0) or integer(0)) as one of the arguments in a .C()
> call with DUP=TRUE (the default), the C routine would be passed NULL
> (the C pointer, not R NULL) in the corresponding argument. The current

Where did you get that from?  The documentation says it passes an (e.g.) 
double* pointer to a copy of the data area of the R vector.  There is no 
change in the documented behaviour ....  Now, of course a zero-length 
area can be at any address, but none is stated anywhere that I am aware of.

> development version instead passes it a pointer to what appears to be
> memory location immediately following the the SEXP that holds the
> metadata for the argument. If the argument has length 0, this is often
> memory belonging to a different R object. (DUP=FALSE in 2.15.0
> appears to have the same behavior as R-devel.)
>
> .C() documentation and Writing R Extensions don't explicitly specify a
> behavior for 0-length vectors, so I don't know if this change is
> intentional, or whether it was a side-effect of the following news item:
>
>        .C() and .Fortran() do less copying: arguments which are raw,
>        logical, integer, real or complex vectors and are unnamed are not
>        copied before the call, and (named or not) are not copied after
>        the call.  Lists are no longer copied (they are supposed to be
>        used read-only in the C code).
>
> Was the change in the empty vector behavior intentional?
>
> It seems to me that standardizing on the behavior of giving the C
> routine NULL is safer, more consistent with other memory-related
> routines, and more convenient: whereas dereferencing a NULL pointer is
> an immediate (and therefore easily traced) segfault, dereferencing an

That's not true, in general.

> invalid pointer that is nevertheless in the general memory area
> allocated to the program often causes subtle errors down the line;
> R_alloc asked to allocate 0 bytes returns NULL, at least on my platform;

Again, undocumented and should not be relied on.

> and the C routine can easily check if a pointer is NULL, but with the
> R-devel behavior, the programmer has to add an explicit way of telling
> that an empty vector was passed.

It's no different from any other vector length: it is easy for careless 
programmers to read/write off the ends of the allocated area, and this 
is why in R-devel we have an option to check for that (and of course 
also what valgrind is good at finding in an instrumented version of R).

> I've attached a small test case (dotC_NULL.* files) that shows the
> difference. The C file should be built with R CMD SHLIB, and the R file
> calls the functions in the library with a variety of arguments. Output I
> get from running
> R CMD BATCH --no-timing --vanilla --slave dotC_NULL.R
> on R 2.15.0, R trunk, and R trunk with my patch (described below) are attached.
>
> The attached patch (dotC_NULL.patch) against the current trunk
> (affecting src/main/dotcode.c) restores the old behavior for DUP=TRUE
> (i.e., 0-length vector ->  NULL pointer) and extends it to the DUP=FALSE
> case. It does so by checking if an argument --- if it's of mode raw,
> integer, real, or complex --- to a .C() or .Fortran() call has length 0,
> and, if so, sets the pointer to be passed to NULL and then skips the
> copying of the C routine's changes back to the R object for that
> argument. The additional computing cost should be negligible (i.e.,
> checking if vector length equals 0 and break-ing out of a switch
> statement if so).
>
> The patch appears to work, at least for my package, and R CMD check
> passes for all recommended packages (on my 64-bit Linux system), but
> this is my first time working with R's internals, so handle with care.

That's easy: we will not be changing this.  In particular, the new 
checks I refer to above rely on passing the address of an in-process 
memory area with guard bytes.

>                                     Best,
>                                     Pavel Krivitsky
>
>
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list