[Rd] compiling C code using headers from another R package

Hervé Pagès hpages at fhcrc.org
Tue Mar 12 23:25:25 CET 2013


On 03/12/2013 02:15 PM, Simon Urbanek wrote:
>
> On Mar 12, 2013, at 4:56 PM, Hervé Pagès wrote:
>
>>
>>
>> On 03/12/2013 12:53 PM, Simon Urbanek wrote:
>>>
>>> On Mar 12, 2013, at 3:35 PM, Hervé Pagès wrote:
>>>
>>>>
>>>>
>>>> On 03/12/2013 11:56 AM, Simon Urbanek wrote:
>>>>>
>>>>> On Mar 12, 2013, at 2:48 PM, Hervé Pagès wrote:
>>>>>
>>>>>> On 03/12/2013 11:09 AM, Simon Urbanek wrote:
>>>>>>>
>>>>>>> On Mar 12, 2013, at 2:01 PM, Hervé Pagès wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 03/12/2013 09:55 AM, Simon Urbanek wrote:
>>>>>>>>>
>>>>>>>>> On Mar 12, 2013, at 12:30 PM, Kevin Horan wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     Thanks for your input. To clarify, I don't need to use any part of GSL in my R code, nor do I wish to make any part of it accessible to users of eiR. I need it to compile other C/C++ code (LSH KIT), which I did not write, that will itself be used in eiR.
>>>>>>>>>>     My goal is allow the user to install eiR without also having to install GSL before hand.
>>>>>>>>>
>>>>>>>>> If your package is on CRAN they won't need to as we are providing Mac and Windows binaries.
>>>>>>>>
>>>>>>>> I think that at least on Windows, the user would still need to have the
>>>>>>>> GSL installed on his/her machine.
>>>>>>>>
>>>>>>>
>>>>>>> Why?
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Linux can get the binaries form their distro, so the dependencies are installed automatically.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The target audience is people in bioinformatics who may not how to install something like GSL. It seems like what I was suggesting is not such a good idea, if it will be hard to reliably find the header files from another R package. I could also push all of GSL into eiR, but as GSL has over 5000 files, this makes the package very large ( >22 MB) and  slow to compile. Both of which are a problem when submitting a package to bioconductor. It may very well be that leaving GSL as an external dependency to eiR is really the best and easiest way, but I just wanted to see if there was any way to make it easier for the user.
>>>>>>>>>
>>>>>>>>> Can you clarify what you mean by "user"? The vast majority of R users use binaries, so all this is irrelevant to them as they don't need to install GSL at all.
>>>>>>>>
>>>>>>>> FWIW we currently have 7 or 8 Bioconductor packages that require the
>>>>>>>> GSL as an external system lib. That's because even if we provide Windows
>>>>>>>> binaries for those packages, those binaries are dynamically linked.
>>>>>>>> Is there a way to build those binaries that would avoid that dependency?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, use static libgsl.
>>>>>>
>>>>>> Ah yes, of course. Thanks for the reminder. Just checked with Dan and
>>>>>> turns out that we are using that libgsl.a too (made by Brian D. Ripley)
>>>>>> on our build system. Some Bioconductor packages have README or INSTALL
>>>>>> files that still mention that the Windows user needs to install the GSL
>>>>>> but that doesn't seem to be the case so we'll make sure this information
>>>>>> gets updated.
>>>>>>
>>>>>> Also the SystemRequirements field in the DESCRIPTION file can be a
>>>>>> little bit confusing as it suggests that everybody requires the stuff
>>>>>> listed here when it actually depends whether the user is installing the
>>>>>> binary or not and how the binary was made.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Anyway, to answer Kevin's original question:
>>>>>>>>
>>>>>>>>   how do I know where the GSL library and header files, packaged
>>>>>>>>   in GSLR, would live so I can point the compiler at them?
>>>>>>>>
>>>>>>>> Use the LinkingTo field.
>>>>>>>>
>>>>>>>
>>>>>>> No, you're not linking to another package, your'e linking to a *library*. LinkingTo uses R's own mechanism for symbol detection in another *package*. I know, the name is a bit misleading but those are two different things.
>>>>>>
>>>>>> I understand that but that's what Kevin wants right, i.e. linking
>>>>>> to another package.
>>>>>
>>>>> No (you cannot link to another package - that's the misnomer), he wants to link to a library provided by another package (see his e-mail, he's asking about how to locate the GSL library supplied with RGSL, not to RGSL itself).
>>>>
>>>> Maybe I misunderstand what the OP wants to do, and the only way to know
>>>> would be for him to clarify, but IIUC the RGSL package would only
>>>> contain the GSL library. At least that's how I understand it (and
>>>> that's what makes sense to me). So the RGSL package would contain
>>>> only 1 shared object (or 1 shared object per sub-arch): the GSLR.so
>>>> file (extension may vary). So I'm not sure what's the difference between
>>>> linking to the "GSL library supplied with RGSL" and linking to
>>>> "RGSL itself"?
>>>>
>>>
>>> GSL library inside RGSL is libgsl.* (where * is a, dylib, so, dll depending on the type and OS). This is *not* the same as RGSL.so/dll which would be the package shared object compiled by R for the package. There is a big difference: the shared objects that R creates for packages cannot be linked to, they are meant to be used with dlopen()/dlsym() - which is what R uses to load symbols entry points. That is also the reason why using LinkingTo: requires explicit exposure of symbols by the package providing the symbols as well as explicit loading of the symbols by the package that uses them. This is entirely different than linking to a library - in the latter case the linker (not R) establishes the connection between the symbols in the library and references - and in case of a static library they get copied into the binary that is being created (which is why you don't need it anymore after linking).
>>
>> OK, thanks for the clarification. I agree with you that putting
>> libgsl.* inside RGSL sounds like a complicated solution and that
>> LinkingTo wouldn't work.
>>
>> FWIW I was suggesting the use of LinkingTo with a setup where RGSL
>> contains the GSL source code under src/ and the header files
>> under inst/include/. Plus an extra file under src/ for registering
>> the GSL API. As mentioned earlier, this is probably not the best
>> way to go, but that *should* work.
>>
>
> In theory, yes, but it's less efficient and requires you to get the declarations of the imported symbols right. I'd argue that it's really error prone unless you have some auto-generator for the necessary R API code. This is really intended for R package exposing a few calls, not for re-mapping calls of other libraries through a package.  It doesn't mean you can't do it, but I wouldn't want to maintain such a package :).

Yes with a script for auto generating the necessary R API code.
I've done this by hand for a package that exposes 132 symbols
and I know how painful it is. I would certainly not do this by hand
for a beast like GSL or BOOST ;-)

>
>
>> The reason I'm interested in clarifying this is that we are facing
>> a similar situation with other libraries (e.g. the BOOST library)
>> used by some Bioconductor packages. Right now, each Bioconductor
>> package includes its own version of the BOOST source code, which
>> is of course less than optimal. Ideally we'd want to wrap the BOOST
>> source (or a subset of it, it's huge!) in something like an rBOOST
>> package and use a setup similar to what I describe above for RGSL
>> (i.e. using LinkingTo). Are there better ways? Is there something
>> like an RcppBOOST package? Sounds like, like for the GSL, it would
>> be better to install the static BOOST libs on the build machine and
>> have client packages link against that
>
> That's what we do on CRAN (at least I do for Mac binaries - I didn't check Windows).

Good to know. Sounds like we should check the availability of the
BOOST static libs for Windows before we take a decision.

>
> Note that the packages can still use internal version of BOOST regardless.

That's what they do right now. They take a long time to compile. And the
Linux user has to recompile the entire thing every time there is a new
version of the package, even if it's only a comma that was added in the
man page for a function that has nothing to do with BOOST. And this for
every package that uses an internal version of BOOST.

> Actually, with boost I'd argue that's not a bad idea, because that way the package knows that it's using a version that works (since there are version compatibility issues with Boost).

Sounds like an argument maybe in favor of going for the rBOOST solution
over having the BOOST static libs on the build machine. That way the
client package knows exactly what to expect: they take it or they leave
it (i.e. use their own internal BOOST). We've already seen the tricky
situation where 2 BioC packages require 2 different versions of the
same external library. A real pain from a build system maintenance
perspective.

>
>
>> (but that also means more
>> complexity in the client packages since they need a configure script).
>>
>
> They don't necessarily - only if there are some special options they want to enable/disable depending on some run-time checks.

I see.

Thanks,
H.

>
> Cheers,
> Simon
>
>
>
>> Thanks,
>> H.
>>
>>
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>>> Very confusing to me. Thanks for your time and sorry if I'm missing something obvious.
>>>>
>>>> H.
>>>>
>>>>>
>>>>>
>>>>>> Yes there is the extra difficulty to register all the C functions in the GSL API (as pointed out by Dirk) but that's another story.
>>>>>>
>>>>>
>>>>> That's not another story - that's very much part of the story why using LinkingTo is not a good idea.
>>>>>
>>>>> Cheers,
>>>>> Simon
>>>>>
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> H.
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Simon
>>>>>>>
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> H.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Simon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> So, any other suggestions about how this could be accomplished?  Thanks.
>>>>>>>>>>
>>>>>>>>>> Kevin
>>>>>>>>>>
>>>>>>>>>> On 03/12/2013 05:26 AM, Simon Urbanek wrote:
>>>>>>>>>>> Kevin,
>>>>>>>>>>>
>>>>>>>>>>> On Mar 11, 2013, at 5:20 PM, Kevin Horan wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am developing an R package, eiR,  which depends on another C library, GNU scientific library (GSL). In order to make life easier for the user, it would be nice to not have this as an external dependency, thus I would like to wrap this library in another R package, say GSLR for example. Thus far I know how to do this. The C code in eiR requires the .so library and the header files from GSL in order to compile. So the idea is that eiR would depend on GSLR, then GSLR gets compiled and installed first, then, while eiR is installing, it should be able to make use of the GSL library and header files while compiling. So my question is, how do I know where the GSL library and header files, packaged in GSLR, would live so I can point the compiler at them? I know how to find the installed directory of an R package from within R, but is there way to find that out using just Makevars or a Makefile? I'm open to suggestions about a better way organize all of this as well. I 
 li
>> ke
>>>> t
>>>>>> he
>>>>>>>> !
>>>>>>>>>    idea of keeping the GSL code separate so that it can be updated/changed independently from eiR though.
>>>>>>>>>>> Have a look at Rcpp.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>     I'm also aware of the gsl R library on CRAN, however, this just wraps GSL in R functions, but I need to use the GSL C functions in other C code in eiR.
>>>>>>>>>>>>
>>>>>>>>>>> Why is what you are proposing any better than simply using GSL in eiR? You will still need the GSL external dependency for GSLR and you are only adding a lot of complexity by linking into another package's external directory (you cannot use libs) which is in itself very tricky (you'll have to deal with both static and shared version, multi-arch setups, possible relocation etc.). It won't make it any easier on the user, rather to the contrary as there will be more things to break. The only reason Rcpp goes into such length to do this is because it has no choice (the Rcpp library has to use the same libR so cannot be used as external dependency) - I would certainly not recommend it for something as trivial as providing GSL.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Simon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Kevin
>>>>>>>>>>>>
>>>>>>>>>>>> ______________________________________________
>>>>>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Hervé Pagès
>>>>>>>>
>>>>>>>> Program in Computational Biology
>>>>>>>> Division of Public Health Sciences
>>>>>>>> Fred Hutchinson Cancer Research Center
>>>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>>>> P.O. Box 19024
>>>>>>>> Seattle, WA 98109-1024
>>>>>>>>
>>>>>>>> E-mail: hpages at fhcrc.org
>>>>>>>> Phone:  (206) 667-5791
>>>>>>>> Fax:    (206) 667-1319
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hervé Pagès
>>>>>>
>>>>>> Program in Computational Biology
>>>>>> Division of Public Health Sciences
>>>>>> Fred Hutchinson Cancer Research Center
>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>> P.O. Box 19024
>>>>>> Seattle, WA 98109-1024
>>>>>>
>>>>>> E-mail: hpages at fhcrc.org
>>>>>> Phone:  (206) 667-5791
>>>>>> Fax:    (206) 667-1319
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fhcrc.org
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>>
>>>>
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list