[Rd] Why did R 3.0's resolveNativeRoutine remove full-search ability?

Thomas Lumley tlumley at uw.edu
Wed Apr 23 23:09:18 CEST 2014


On Sat, Apr 19, 2014 at 2:29 AM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
> Andrew,
>
> On Apr 18, 2014, at 9:55 AM, Andrew Piskorski <atp at piskorski.com> wrote:
>
>> In versions of R prior to 3.0, by default .C and .Call would find the
>> requested C function regardless of which shared library it was located
>> in.  You could use the PACKAGE argument to restrict the search to a
>> specific library, but doing so was not necessary for it to work.
>>
>> R 3.0 introduced a significant change to that behavior; from the NEWS
>> file:
>>
>>  CHANGES IN R 3.0.0:
>>  PERFORMANCE IMPROVEMENTS:
>>    * A foreign function call (.C() etc) in a package without a PACKAGE
>>      argument will only look in the first DLL specified in the
>>      NAMESPACE file of the package rather than searching all loaded
>>      DLLs.  A few packages needed PACKAGE arguments added.
>>
>> That is not merely a performance improvement, it is a significant
>> change in functionality.  Now, when R code in my package foo tries to
>> call C code located in bar.so, it fails with a "not resolved from
>> current namespace (foo)" error.  It works if I change all my uses of
>> .C and .Call to pass a PACKAGE="bar" argument.  Ok, I can make that
>> change in my code, no big deal.
>>
>> What surprises me though, is that there appears to be no way to invoke
>> the old (and very conventional Unix-style), "I don't want to specify
>> where the function is located, just keep searching until you find it"
>> behavior.  Is there really no way to do that, and if so, why not?
>>
>> Comparing the R sources on the 3.1 vs. 2.15 branches, it looks as if
>> this is due to some simple changes to resolveNativeRoutine in
>> "src/main/dotcode.c".  Specifically, the newer code adds this:
>>
>>   errorcall(call, "\"%s\" not resolved from current namespace (%s)",
>>             buf, ns);
>>
>> And removes these lines:
>>
>>   /* need to continue if the namespace search failed */
>>   *fun = R_FindSymbol(buf, dll.DLLname, symbol);
>>   if (*fun) return args;
>>
>> Is that extra call to R_FindSymbol really all that's necessary to
>> invoke the old "keep searching" behavior?  Would it be a good idea to
>> provide an optional way of finding a native routine regardless of
>> where it's located, perhaps via an optional PACKAGE=NA argument to .C,
>> .Call, etc.?
>>
>> And now I see that help(".Call") says:
>>
>>   'PACKAGE = ""' used to be accepted (but was undocumented): it is
>>    now an error.
>>
>> I assume passing PACKAGE="" used to invoke the same "keep searching"
>> behavior as not passing any PACKAGE argument at all.  So apparently
>> the removal of functionality was intentional.  I'd like to better
>> understand why.  Why should that be an error?  Or said another way,
>> why has traditional Unix-style symbol resolution been banned from use
>> with .C and .Call ?
>>
>
> I cannot speak for the author, but a very strong argument is to prevent (symbol) namespace issues. If you cannot even say where the symbol comes from, you have absolutely no way of knowing that the symbol you get has anything to do with the symbol you intended to get, because you could get any random symbol in any shared object that may or may not have anything to do with your code. Note that even you as the author of the code have no control over the namespace so although you intended this to work, loading some other package can break your code - and in a fatal manner since this will typically lead to a segfault. Do you have any strong use case for allowing this given how dangerous it is? Ever since symbol registration has been made easy, it's much more efficient and safe to use symbols directly instead.
>


As a follow-up to this, note that with traditional Unix symbol
resolution it was forbidden to have two different routines with the
same name linked into an object. That just isn't an option for R
because of the package system.  This isn't theoretical: the PACKAGE=
argument was introduced when finding the wrong symbol resolution
became a real problem late last century
(http://marc.info/?l=r-devel&m=107151103308418&w=2), but there wasn't
a good cross-package calling mechanism for quite a while. Now there is
a cross-package mechanism that works, where the Unix-style approach
cannot be made to work safely with packages from multiple authors.

   -thomas


-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-devel mailing list