[R] Writing own simulation function in C

Sat Mar 6 05:51:15 CET 2010

TheSavageSam wrote:
> 
> I am wishing to write my own random distribution simulation function using
> C programmin language(for speed) via R. I am familiar with R programming
> but somewhat new to C programming. I was trying to understand "Writing R
> extensions" -guide and its part 6.16, but I found it hard to
> understand(http://cran.r-project.org/doc/manuals/R-exts.html#Standalone-Mathlib).
> I also tried to get familiar with the example codes, but it didn't make me
> any wiser.
> 
> The biggest problem seems to be how to get(what code to write in C) random
> uniform numbers using Rmath. That seems to be too complicated to me at the
> moment. So if someone of you could give a small example and it would
> definitely help me a lot. All I wish to do first is to finally write(and
> understand) my own function similar to what you run in R Command line via
> command "runif".
> 
> And all this I am hoping to do without recompiling my whole R. Instead of
> that I think that it is possible to use dyn.load("code.so") in R. (Yes, I
> use linux)
> 

If the code in this message gets mangled by the mailing list, try viewing it
on nabble:

http://n4.nabble.com/Writing-own-simulation-function-in-C-td1580190.html#a1580190

...

Actually, funny story... I wrote this post using Nabble, but the Spam
Detector refuses to let me post it claiming that there are "too many sex
words" due some constructs in the R language.  So I guess this confirms that
using R is the quickest way to increase your statistical manhood/womanhood.

As a result of this, all code chunks have been moved to a gist at GitHub--
links are provided.

...

I would suggest first learning how to implement callbacks to R functions
from C functions rather than diving straight into calling the C
implementations of R functions.  The reasons are:

  1.  You already know how the R functions is supposed to be called.

  2.  You don't have to search through the R source to track down the C
implementation of the function you are after.

I would also bet that the overhead of calling an R function that calls it's
compiled implementation is not that significant-- but I haven't done any
profiling.

The way I would approach this problem would be to create a new package since
the package structure helps manage the steps involved in loading and testing
the compiled code.

So, here's an example of calling the R function runif() from C (tested on
Linux Mint 8 with R 2.10.1):

Start an R terminal and execute the following:

  setwd( 'path/to/wherever/you/like/to/work' )

  # Create a dummy wrapper function to start the package with
  myRunif <- function( n, min = 0, max = 1 ){}

  # Start the package
  package.skeleton( 'RtoC' )

Now, quit R.  You'll notice that a folder called RtoC appeared in the
directory where R was working.  Edit RtoC/R/myRunif.R to contain the
following:

  see:
  http://gist.github.com/323498#file_my_runif.r

The above R function is a wrapper for our compiled C function.  It states
that we will be calling a C routine named "myRuinf" and passing the
parameters n, min and max.  The .Call() interface will pass these arguments
as separate SEXP objects, which are the C-level representation of R objects. 
The alternatives to .Call are .External which gathers all the arguments up
and passes them as a single SEXP and .C() which passes them directly as
basic C variables (care must be used with .C(), you may need to wrap things
in as.double(), as.integer(), etc.).

I am using .Call() over .C() because .Call() allows an SEXP (i.e. native R
object) to be returned directly.   .External() also returns a SEXP, but we
are calling back to an R function so we would have to split each argument
into a separate SEXP anyway.  With .C() we would have to coerce the C types
to separate SEXP variables. Furthermore, with .C(), an additional variable
to hold the return value would have to be included as an argument the
function as we are expecting to return a vector and all the arguments are
scalars.

So, now the task is to write the C implementation of myRunif.  Create the
following folder in the package:

  mkdir  RtoC/src

Any C or Fortran code placed in the /src folder of an R package will be
automagically compiled to a shared library that will be included when the
package is installed.  So, let's add a C routine, RtoC/src/myRunif.c:

  see:
  http://gist.github.com/323498#file_m_runif.c

The first thing that myRunif.c does is to locate the namespace of the stats
function.  This is important as runif() lives in that namespace and we will
be executing a call to it later.  The first thing to note is that results
returned by R are assigned inside of a PROTECT() statement.  This is because
SEXPs are *pointers* to data and we don't want the R garbage collecter to
munch the data we're pointing to before we get a chance to use it.

After finding the stats namespace, the second operation is to set up the
call back to the R function ruinf()-- this is started by creating a vector
of type "language" that has a length of 4-- one slot for the function name
and then three more slots because runif takes three arguments.

Next, the slots of the LANGSXP are filled- first by locating the name of the
function we wish to call using findFun() and then by filling in the
arguments.  The assignment of each argument is followed by a call to SET_TAG
which tags the argument with it's name in the R function.  Basically, lines
18-29 construct an R function call of the form:

  runif( n = n, min = min, max = max )

Next, a SEXP is created to hold the results of the function call that was
constructed.  The variable is then assigned the results of the function call
inside a PROTECT()

Finally, UNPROTECT(3) is called before the results are returned to R.  This
is don because we called PROTECT() 3 times.  If you don't call UNPROTECT(),
you will see warnings about a "Stack Imbalance" when running your code in R.

It is worth noting that I am being incredibly verbose with lines 15-36, I
could have replaced them with:

  see:
  http://gist.github.com/323498#file_my_runif_concise.c

In the case above, lang4() provides a shortcut for building a LANGSXP with 4
slots and the call arguments are matched by position.  I showed the long
version in case you want to execute a call with more than 3 arguments or you
need to execute a call using key=value matching.

Finally, you need to add one more routine that causes the dynamic library to
be loaded when the R package is loaded.  Since the package doesn't have a
namespace, we'll use the function .First.lib() and place it in RtoC/R/zzz.R:

  see:
  http://gist.github.com/323498#file_zzz.r

If you add a namespace to your package, you will need to use .onLoad()
instead of .First.lib().

Now build and install the package:

  R CMD build RtoC
  R CMD INSTALL RtoC_0.1.tar.gz

You should be able to execute the following in R:

  library( RtoC )

  myRunif( 10 )

[1] 0.8072035 0.3978969 0.6492380 0.3614537 0.9150126 0.7287221 0.4242656
[8] 0.9401839 0.8275597 0.7270001

  myRunif( 10, -10, 10 )

[1]  0.669904 -7.603356 -6.632470  6.753585 -9.424831  8.758971 -4.430569
[8]  3.737836 -5.297374 -9.968060

Happy hacking!

-Charlie

-----
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: http://n4.nabble.com/Writing-own-simulation-function-in-C-tp1580190p1580423.html
Sent from the R help mailing list archive at Nabble.com.