[Rd] naive question regarding running parallel C code from R

Dirk Eddelbuettel edd at debian.org
Fri Apr 18 18:52:24 CEST 2008


On 18 April 2008 at 13:03, tyler wrote:
| Hi,
| 
| I have only the vaguest notions of what parallel programing, but I think
| I have a situation where it might be of use to me, or at least provide
| me with the opportunity to learn more about it. Before I invest in
| figuring out the nuts and bolts, can anyone confirm that this is a sane
| approach, or provide alternatives that I could pursue?
| 
| I'm running stochastic simulations, with the actual simulation in C
| code, with an R interface to set up the parameters, format the output,
| and save the resulting objects periodically through the run. The basic
| layout is:
| 
| R function sets up the run
| R for(c = 0; c < CYCLES; c++)
|   call C function
|   C for(i = 0; i < TIME; i++)
|     immigration loop adds individuals to the recruit vector
|     birth loop adds individuals to the recruit vector
|     recruit vector is added to the community vector
|     death loop removes excess individuals
|   Return results to R, which processes and saves the objects
|   Repeat
| 
| A typical run has 20 cycles, each with 500 time steps, and takes about
| an hour. The immigration and birth loops are independent of each other,
| and so could run simultaneously. They both add to the recruit vector,
| but the order of the addition doesn't matter so long as both finish
| before the recruit vector is added to the community vector. The
| immigration, birth, and death loop iterate over arrays in a way that the
| outcome at different locations is independent. i.e., the impact of the
| birth vector on recruit vector position 0 has no influence on what the
| birth vector does to recruit vector position 1.
| 
| What I'm thinking of doing is running the birth and immigration loops as
| separate threads, and possibly running each of those threads as a group
| of threads - so a thread for a birth loop that iterates over the first N
| positions, another thread for the second N positions and so on.
| 
| I'm keen to learn about parallel programming, but I don't understand
| enough yet to make sense of the information in the R extensions manual
| and the various discussions on this list about R being thread-safe. Does
| it matter if R is thread-safe if the actual simulation is being computed
| in separate, shared C code?
| 
| I'm running my current, sequential code, on a cluster that supports both
| OpenMP and MPI, should I figure out how to use it.

As I recall, you use Debian so do

   $ sudo apt-get install r-cran-snow r-cran-rmpi

to get R support working out-of-the box. Then study the examples for Snow on
Luke's website. [ I also have some slides on my website from presentations I
gave a few years ago. ]

That allows you to _easily_ do the so-called embarassingly parallel: same
problem, different parameters.  Or, you could also loop over CYCLES across
the cluster.  Start with something simple to study it and then go from there.

Hope this helps, Dirk

--
Three out of two people have difficulties with fractions.



More information about the R-devel mailing list