[Rd] FW: [Rcpp-devel] Question on 5.6 Interfacing C++ code

Fri Apr 22 21:22:16 CEST 2011

smcguffee wrote:
> 
> Hi Charlie, 
> 
> Thanks for the help,
> 
> I think some of my story of having been reading the documentation and
> playing with examples for weeks has gotten lost in the switch of threads.
> I
> think most of that confusion also comes from me not figuring out how to
> connect different sections of the documentation. I think I get it now that
> just because I can do 'R CMD SHLIB X.cc X_main.cc' from a command line
> doesn¹t mean that I need to put that command into a package directly, and
> even that I can¹t explicitly put that line in a package because it¹s
> magically done for me. I appreciate folks having patience with me as some
> of
> my questions seem redundant, but it is all starting to come together for
> me.
> 

When I first started out extending R with compiled code, I used R CMD SHLIB
as well. Don't know why exactly, it was probably the first thing I stumbled
across in the manual. Once I learned about making packages and that putting
C, C++ or Fortran code in the `src` directory of the package magically
caused a library to be built, I quit using R CMD SHLIB---probably haven't
touched it in years.

I think R CMD SHLIB may be intended more for compiling external programs
that want to hook into the R libraries rather than things intended to be
loaded by R it's self.

smcguffee wrote:
> 
> At this point I think I am beginning to get a good enough idea of how this
> stuff is working on the R interface side of things. I pretty much just
> have
> one more question:
> 
> How do I let users adjust their system specific paths to non-R libraries
> for
> my package installation but not for everyone else¹s package installation?
> I
> get the feeling users can control things in my package somehow through
> their
> R configurations if I use the PKG_LIBS = `$(R_HOME)/bin/Rscript -e
> "Rcpp:::LdFlags()"` command in the src/Makevars file. However, I'm still
> lost as to how this would be customized to my package. I mean, that
> command
> doesn¹t specify anything unique to my package and could potentially be
> used
> by other package installations too. That file is inside my package, so I
> don¹t think users can modify it directly and explicitly with their system
> specific paths before they install. Maybe if other packages link to extra
> libraries it doesn't hurt anything. Is that the answer? Would users need
> to
> add all my requisite non-R libraries into their R configurations to get
> `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"` to link my package correctly
> and let all other packages link to way more libraries than necessary?
> 

Well, the best answers to this question lie inside the "Writing R
Extensions" manual---specifically Section 1.2 "Configure and Cleanup".  The
short version is:

 - If the code in your package needs custom compiler flags, add a 
`src/Makevars` file that contains them.

 - If the code in your package dependes on external libraries, add a
Configure script, written using GNU autotools, that will produce
`src/Makevars` from a template `src/Makevars.in` that contains the `-L` and
`-I` flags required to link your code against the external library.

However, I will again suggest taking this one step at a time:

  - Build a toy package that includes C or C++ code that needs to be
compiled. Observe how `R CMD INSTALL` compiles the code for you and how to
use `.onLoad` or `.First.Lib` to `dyn.load` the resulting library when a
user runs `library` on your package.  Bonus points for reading enough of
"Writing R Extensions" to know if having an R NAMESPACE in your package has
any effect on this process.

  - Extend your toy package to include C++ code that needs custom compiler
flags. See how you can achieve this with `src/Makevars`.

  - Extend your package again with an external dependency that requires a
`configure` script. A good example of such a package is `rgdal`---it has to
find both the GDAL and PROJ4 libraries in order to compile operational code.

If you run into any trouble along the way, stop and read "Writing R
Extensions". If you really get stuck, you can then ask the mailing list a
very focused question along with an example that shows what is going wrong
for you. Then you have a good change of getting helpful answers.  Right now
your questions are spanning the entire spectrum from beginning to advanced
package authoring and so the most likely answer you will get from the list
is "slow down and read the manual".

smcguffee wrote:
> 
> Thanks for your help,
> 
> Sean
> 
> P.S.
> 
> The rest of this message is my rambling, so only those interested in my
> thoughts should continue reading. Especially those interested in sparing
> their own time should stop reading here--the question above is my last
> inquiry for the list. What comes below is just my train of thoughts/flow
> of
> consciousness spewing needlessly.
> 
> It was definitely a good idea for me to look in the R source code. It
> seems
> that dynload.c names.c dotcode.c Rdynload.c were of most interest to me in
> understanding that magical unicorn with an adorable animated cartoon
> story.
> I found that link quite enjoyable by they way! Regarding the files I just
> mentioned, I notice that the code is in the form of c files and that quite
> a
> lot of info from library files is used to get function pointers in the
> functions of interest to me. I wonder if making those files into cpp files
> that would get compiled with a c++ compiler would let them call c++
> functions directly or if the info to get the function pointers would be of
> a
> completely different type of syntax and/or if there is more to that story.
> I
> suppose it makes no difference in practice because one would probably
> still
> have to make a c++ wrapper function to interface with R, but I'm just
> curious about this stuff. I mean, in principle, it makes sense to be able
> to
> call a function directly without having to go through the trouble of
> wrapping it in c, especially for hundreds of C++ functions in a library.
> It
> might be that I can write one general argument handling function in C as
> is
> to interface with R and let it call any of my C++ functions in my
> libraries,
> slightly shortening my tasks. Anyway, it was really eye opening to see
> that
> R is actually calling it's own generic pointers to functions and just
> pre-assigning them to function pointers from libraries. I didn't know that
> could be done, and I imagine hackers must love that capacity, a capacity
> that seems to be inherent in c or c++. It does seem a little bit limiting
> that the arguments are limited in number and that each function pointer
> with
> a different number of arguments has to be conditionally called inside the
> R
> code. However, I have the same complaint about bash having a limit on the
> total size of data that can be passed as arguments into an executable. It
> looks to me like fixing that type of thing in bash requires recompiling
> the
> kernel because it's hard wired non-dynamically into the capacity of
> launching executables themselves. I hope this type of thing starts to
> change
> as hardware is way exceeding the original expectations of the
> non-dynamically allocated original design of executable launch and dynamic
> allocation has clearly demonstrated it¹s superiority in general. That type
> of thing comes into issue for me on command line scripts when I sometimes
> have lists of files that are longer than the capacity of command line
> arguments. For example, a "grep someText *" or ³ls *² will only work if
> the
> size of the arguments in the * expansion is less than the system's
> capacity
> for arguments passed to executables. I hit that limit all the time, and
> that's annoying because scripts that normally work break in larger
> situations, rendering their applicability useless in what are typically
> more
> interesting cases. Anyway, that's all a tangent from this R interfacing
> stuff. However, it was news to me that R could have a similar type of
> limit
> for functions in packages until I looked into the code. I don't think this
> is an issue in R because I'll just design one Rcpp argument to contain all
> the info I need inside itself. However, it's good to know that I need to
> do
> that. Anyway, I'm also wondering if it might be easier to modify compilers
> themselves and/or incorporate their code into R's code, i.e. easier than
> doing all this work around to fit into their mold. In a way that is sort
> of
> done to access the function pointers from libraries, but I mean, it seems
> logical that a program such as R should be able to call any function with
> any number of arguments abstractly without needing to have the functions
> get
> conditionally called with a given number of arguments at compile time for
> R.
> I can imagine converting a string to a call to a number of arguments that
> is
> determined by the syntax of the string without being defined before the
> compilation of R. That type of idea, if possible, could allow a more
> dynamic
> range of options in packages, at least not limited by a number of
> arguments.
> Like I said, that¹s not important because one argument can contain an
> endless amount of info, but it sparked my curiosity. I might peak at GNU's
> gcc compiler collection to see if I can come up with some ideas for that
> type of thing--basically building dynamic compilation and execution
> options,
> but I imagine it would be way over my head, a long time coming, and of
> course potentially unstable. The long and short of it for me is that it
> was
> way cool to see how R is calling C functions from packages or non-R
> libraries.
> 

Quite a brain dump there!  Some things that you may want to look into in the
future:

  - The original mailing list you posted to, Rcpp, is for an R package that
wraps the C API of R into C++ classes.  I would bet it also provides methods
for calling R code and C++ without having to write as many R functions.  I
have not had the pleasure of using Rcpp yet---Fortran was my first compiled
language and I am still moving my way up the food chain :)

  - The inline package may be of interest to you---It allows C, C++ and
Fortran programs to be stored as text strings at the R level and then
dynamically compiled, loaded and interfaced.  Could be along the lines of
what you were thinking about with "building dynamic compilation and
execution options".

  - Also, it is always fun to drop by the Omegahat project
(www.omegahat.org) and see what Duncan Temple Lang has been cooking up. He
has a couple of packages for interfacing R with compiled code via LibFFI
(rather than the built in pointer method you observed) and one package that
has the beginnings of some LLVM bindings.

-Charlie

On 4/21/11 10:02 PM, "Sharpie" <chuck at sharpsteen.net> wrote:

> 
> smcguffee wrote:
>> 
>> You are right, I looked and I did find the R source code. However, it's
>> largely written in R! I mean, I don't know how to trace the R code where
>> INSTALL is recognized and follow it to a c or c++ level command. For
>> example
>> these are hits in .R files, not c files, and I don't know how to connect
>> 
>> ...
>> 
>> If you could point me to the functions that are called a c or c++ level,
>> I'd
>> love to see what R is doing for myself.
>> Thanks!
>> Sean
>> 
> 
> Hi Sean!
> 
> Along with many other people in this thread, I would strongly recommend a
> top-down approach to this. Build a package, stick some stuff in the src
> folder, run R CMD INSTALL on it and see what happens. The reason I
> recommend
> this approach is that it lets you focus on writing a package that does
> something useful rather than the nuts and bolts of cross platform
> compilation and installation. R CMD INSTALL takes care of this for you
> automagically and it is very good at what it does.
> 
> I wrote a post some time back about building an example package from
> scratch
> that contains C code:
> 
> http://r.789695.n4.nabble.com/Writing-own-simulation-function-in-C-td1580190.h
> tml#a1580423
> 
> It begins with the using the package.skeleton() function to kickstart
> things, discusses how to make sure the compiled code is dynamically loaded
> when a user runs library(your_package) and even discusses how to call R
> functions from inside of C functions and vice-versa. The example code is
> still available and I'm sure it could be generalized to C++ quite easily.
> There are also some other responses in that thread that offer useful
> advice.
> 
> 
> At the beginning it is just best to treat R CMD INSTALL as a magical
> unicorn
> that gets you where you need to go:
> 
> http://abstrusegoose.com/120
> (keep clicking the images to get the full story)
> 
> 
> If you are absolutely, positively dying to know what really happens...
> well,
> the relative files in the R source are `src/library/tools/R/install.R` and
> `src/library/tools/R/build.R`.
> 
> 
> But seriously. Magical unicorn. Takes care of the hard stuff so you can
> build awesome packages.
> 
> Hope this helps!
> 
> -Charlie

-----
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
--
View this message in context: http://r.789695.n4.nabble.com/FW-Rcpp-devel-Question-on-5-6-Interfacing-C-code-tp3465257p3468640.html
Sent from the R devel mailing list archive at Nabble.com.