[Rd] using autoconf in packages [Was: Question on 5.6 Interfacing C++ code]

Sat Apr 23 00:35:33 CEST 2011

On Apr 22, 2011, at 3:22 PM, Sharpie wrote:

> 
> smcguffee wrote:
>> 
>> Hi Charlie, 
>> 
>> Thanks for the help,
>> 
>> I think some of my story of having been reading the documentation and
>> playing with examples for weeks has gotten lost in the switch of threads.
>> I
>> think most of that confusion also comes from me not figuring out how to
>> connect different sections of the documentation. I think I get it now that
>> just because I can do 'R CMD SHLIB X.cc X_main.cc' from a command line
>> doesn¹t mean that I need to put that command into a package directly, and
>> even that I can¹t explicitly put that line in a package because it¹s
>> magically done for me. I appreciate folks having patience with me as some
>> of
>> my questions seem redundant, but it is all starting to come together for
>> me.
>> 
> 
> When I first started out extending R with compiled code, I used R CMD SHLIB
> as well. Don't know why exactly, it was probably the first thing I stumbled
> across in the manual. Once I learned about making packages and that putting
> C, C++ or Fortran code in the `src` directory of the package magically
> caused a library to be built, I quit using R CMD SHLIB---probably haven't
> touched it in years.
> 
> I think R CMD SHLIB may be intended more for compiling external programs
> that want to hook into the R libraries rather than things intended to be
> loaded by R it's self.
> 
> 
> 
> smcguffee wrote:
>> 
>> At this point I think I am beginning to get a good enough idea of how this
>> stuff is working on the R interface side of things. I pretty much just
>> have
>> one more question:
>> 
>> How do I let users adjust their system specific paths to non-R libraries
>> for
>> my package installation but not for everyone else¹s package installation?
>> I
>> get the feeling users can control things in my package somehow through
>> their
>> R configurations if I use the PKG_LIBS = `$(R_HOME)/bin/Rscript -e
>> "Rcpp:::LdFlags()"` command in the src/Makevars file. However, I'm still
>> lost as to how this would be customized to my package. I mean, that
>> command
>> doesn¹t specify anything unique to my package and could potentially be
>> used
>> by other package installations too. That file is inside my package, so I
>> don¹t think users can modify it directly and explicitly with their system
>> specific paths before they install. Maybe if other packages link to extra
>> libraries it doesn't hurt anything. Is that the answer? Would users need
>> to
>> add all my requisite non-R libraries into their R configurations to get
>> `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"` to link my package correctly
>> and let all other packages link to way more libraries than necessary?
>> 
> 
> Well, the best answers to this question lie inside the "Writing R
> Extensions" manual---specifically Section 1.2 "Configure and Cleanup".  The
> short version is:
> 
> - If the code in your package needs custom compiler flags, add a 
> `src/Makevars` file that contains them.
> 
> - If the code in your package dependes on external libraries, add a
> Configure script, written using GNU autotools, that will produce
> `src/Makevars` from a template `src/Makevars.in` that contains the `-L` and
> `-I` flags required to link your code against the external library.
> 
> However, I will again suggest taking this one step at a time:
> 
>  - Build a toy package that includes C or C++ code that needs to be
> compiled. Observe how `R CMD INSTALL` compiles the code for you and how to
> use `.onLoad` or `.First.Lib` to `dyn.load` the resulting library when a
> user runs `library` on your package.  Bonus points for reading enough of
> "Writing R Extensions" to know if having an R NAMESPACE in your package has
> any effect on this process.
> 
>  - Extend your toy package to include C++ code that needs custom compiler
> flags. See how you can achieve this with `src/Makevars`.
> 
>  - Extend your package again with an external dependency that requires a
> `configure` script. A good example of such a package is `rgdal`---it has to
> find both the GDAL and PROJ4 libraries in order to compile operational code.
> 

I would just discourage looking at what it does to detect proj4 - that's not how configure should be written - running compilation by hand is very error prone (hard to debug, messes up config.log) and against the idea of autoconf - that's what AC_COMPILE_IFELSE, AC_LINK_IFELSE and friends are for. However, it does get other major things right (such as retrieving basic flags and compilers from R).

This is getting a bit off-topic, but I think it's important. Major points to keep in mind when writing configure scripts (IMHO):

1) always ask R for compiler and flags before using them (i.e.,before AC_PROG_CC).
2) always give the user a way to override defaults (defaults are always wrong at some point by definition) - preferably in a standard way
3) make sure the flags you check with (CPPFLAGS, CFLAGS, LDFLAGS, LIBS, ..) reflect what will be used by R from Makevars
4) make sure you test dependencies - if you don't, static libraries will fail so it's a good habit to test with static libraries

People very often forget about 3) -- all tests in autoconf are run respecting the standard flag environment variables, but if you set let's say PKG_LIBS to only include your additions (like -lfoo) then R will be using entirely different flags than autoconf, rendering the whole test useless.

Cheers,
Simon

> If you run into any trouble along the way, stop and read "Writing R
> Extensions". If you really get stuck, you can then ask the mailing list a
> very focused question along with an example that shows what is going wrong
> for you. Then you have a good change of getting helpful answers.  Right now
> your questions are spanning the entire spectrum from beginning to advanced
> package authoring and so the most likely answer you will get from the list
> is "slow down and read the manual".
> 
> 
> 
> 
> smcguffee wrote:
>> 
>> Thanks for your help,
>> 
>> Sean
>> 
>> P.S.
>> 
>> The rest of this message is my rambling, so only those interested in my
>> thoughts should continue reading. Especially those interested in sparing
>> their own time should stop reading here--the question above is my last
>> inquiry for the list. What comes below is just my train of thoughts/flow
>> of
>> consciousness spewing needlessly.
>> 
>> It was definitely a good idea for me to look in the R source code. It
>> seems
>> that dynload.c names.c dotcode.c Rdynload.c were of most interest to me in
>> understanding that magical unicorn with an adorable animated cartoon
>> story.
>> I found that link quite enjoyable by they way! Regarding the files I just
>> mentioned, I notice that the code is in the form of c files and that quite
>> a
>> lot of info from library files is used to get function pointers in the
>> functions of interest to me. I wonder if making those files into cpp files
>> that would get compiled with a c++ compiler would let them call c++
>> functions directly or if the info to get the function pointers would be of
>> a
>> completely different type of syntax and/or if there is more to that story.
>> I
>> suppose it makes no difference in practice because one would probably
>> still
>> have to make a c++ wrapper function to interface with R, but I'm just
>> curious about this stuff. I mean, in principle, it makes sense to be able
>> to
>> call a function directly without having to go through the trouble of
>> wrapping it in c, especially for hundreds of C++ functions in a library.
>> It
>> might be that I can write one general argument handling function in C as
>> is
>> to interface with R and let it call any of my C++ functions in my
>> libraries,
>> slightly shortening my tasks. Anyway, it was really eye opening to see
>> that
>> R is actually calling it's own generic pointers to functions and just
>> pre-assigning them to function pointers from libraries. I didn't know that
>> could be done, and I imagine hackers must love that capacity, a capacity
>> that seems to be inherent in c or c++. It does seem a little bit limiting
>> that the arguments are limited in number and that each function pointer
>> with
>> a different number of arguments has to be conditionally called inside the
>> R
>> code. However, I have the same complaint about bash having a limit on the
>> total size of data that can be passed as arguments into an executable. It
>> looks to me like fixing that type of thing in bash requires recompiling
>> the
>> kernel because it's hard wired non-dynamically into the capacity of
>> launching executables themselves. I hope this type of thing starts to
>> change
>> as hardware is way exceeding the original expectations of the
>> non-dynamically allocated original design of executable launch and dynamic
>> allocation has clearly demonstrated it¹s superiority in general. That type
>> of thing comes into issue for me on command line scripts when I sometimes
>> have lists of files that are longer than the capacity of command line
>> arguments. For example, a "grep someText *" or ³ls *² will only work if
>> the
>> size of the arguments in the * expansion is less than the system's
>> capacity
>> for arguments passed to executables. I hit that limit all the time, and
>> that's annoying because scripts that normally work break in larger
>> situations, rendering their applicability useless in what are typically
>> more
>> interesting cases. Anyway, that's all a tangent from this R interfacing
>> stuff. However, it was news to me that R could have a similar type of
>> limit
>> for functions in packages until I looked into the code. I don't think this
>> is an issue in R because I'll just design one Rcpp argument to contain all
>> the info I need inside itself. However, it's good to know that I need to
>> do
>> that. Anyway, I'm also wondering if it might be easier to modify compilers
>> themselves and/or incorporate their code into R's code, i.e. easier than
>> doing all this work around to fit into their mold. In a way that is sort
>> of
>> done to access the function pointers from libraries, but I mean, it seems
>> logical that a program such as R should be able to call any function with
>> any number of arguments abstractly without needing to have the functions
>> get
>> conditionally called with a given number of arguments at compile time for
>> R.
>> I can imagine converting a string to a call to a number of arguments that
>> is
>> determined by the syntax of the string without being defined before the
>> compilation of R. That type of idea, if possible, could allow a more
>> dynamic
>> range of options in packages, at least not limited by a number of
>> arguments.
>> Like I said, that¹s not important because one argument can contain an
>> endless amount of info, but it sparked my curiosity. I might peak at GNU's
>> gcc compiler collection to see if I can come up with some ideas for that
>> type of thing--basically building dynamic compilation and execution
>> options,
>> but I imagine it would be way over my head, a long time coming, and of
>> course potentially unstable. The long and short of it for me is that it
>> was
>> way cool to see how R is calling C functions from packages or non-R
>> libraries.
>> 
> 
> Quite a brain dump there!  Some things that you may want to look into in the
> future:
> 
>  - The original mailing list you posted to, Rcpp, is for an R package that
> wraps the C API of R into C++ classes.  I would bet it also provides methods
> for calling R code and C++ without having to write as many R functions.  I
> have not had the pleasure of using Rcpp yet---Fortran was my first compiled
> language and I am still moving my way up the food chain :)
> 
>  - The inline package may be of interest to you---It allows C, C++ and
> Fortran programs to be stored as text strings at the R level and then
> dynamically compiled, loaded and interfaced.  Could be along the lines of
> what you were thinking about with "building dynamic compilation and
> execution options".
> 
>  - Also, it is always fun to drop by the Omegahat project
> (www.omegahat.org) and see what Duncan Temple Lang has been cooking up. He
> has a couple of packages for interfacing R with compiled code via LibFFI
> (rather than the built in pointer method you observed) and one package that
> has the beginnings of some LLVM bindings.
> 
> 
> -Charlie
> 
> 
> On 4/21/11 10:02 PM, "Sharpie" <chuck at sharpsteen.net> wrote:
> 
>> 
>> smcguffee wrote:
>>> 
>>> You are right, I looked and I did find the R source code. However, it's
>>> largely written in R! I mean, I don't know how to trace the R code where
>>> INSTALL is recognized and follow it to a c or c++ level command. For
>>> example
>>> these are hits in .R files, not c files, and I don't know how to connect
>>> 
>>> ...
>>> 
>>> If you could point me to the functions that are called a c or c++ level,
>>> I'd
>>> love to see what R is doing for myself.
>>> Thanks!
>>> Sean
>>> 
>> 
>> Hi Sean!
>> 
>> Along with many other people in this thread, I would strongly recommend a
>> top-down approach to this. Build a package, stick some stuff in the src
>> folder, run R CMD INSTALL on it and see what happens. The reason I
>> recommend
>> this approach is that it lets you focus on writing a package that does
>> something useful rather than the nuts and bolts of cross platform
>> compilation and installation. R CMD INSTALL takes care of this for you
>> automagically and it is very good at what it does.
>> 
>> I wrote a post some time back about building an example package from
>> scratch
>> that contains C code:
>> 
>> http://r.789695.n4.nabble.com/Writing-own-simulation-function-in-C-td1580190.h
>> tml#a1580423
>> 
>> It begins with the using the package.skeleton() function to kickstart
>> things, discusses how to make sure the compiled code is dynamically loaded
>> when a user runs library(your_package) and even discusses how to call R
>> functions from inside of C functions and vice-versa. The example code is
>> still available and I'm sure it could be generalized to C++ quite easily.
>> There are also some other responses in that thread that offer useful
>> advice.
>> 
>> 
>> At the beginning it is just best to treat R CMD INSTALL as a magical
>> unicorn
>> that gets you where you need to go:
>> 
>> http://abstrusegoose.com/120
>> (keep clicking the images to get the full story)
>> 
>> 
>> If you are absolutely, positively dying to know what really happens...
>> well,
>> the relative files in the R source are `src/library/tools/R/install.R` and
>> `src/library/tools/R/build.R`.
>> 
>> 
>> But seriously. Magical unicorn. Takes care of the hard stuff so you can
>> build awesome packages.
>> 
>> Hope this helps!
>> 
>> -Charlie
> 
> 
> -----
> Charlie Sharpsteen
> Undergraduate-- Environmental Resources Engineering
> Humboldt State University
> --
> View this message in context: http://r.789695.n4.nabble.com/FW-Rcpp-devel-Question-on-5-6-Interfacing-C-code-tp3465257p3468640.html
> Sent from the R devel mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel