[Rd] portable parallel seeds project: request for critiques

Petr Savicky savicky at cs.cas.cz
Wed Feb 22 22:15:09 CET 2012

On Wed, Feb 22, 2012 at 12:17:25PM -0600, Paul Johnson wrote:
> In order for this to be easy for users, I need to put the init streams
> and set current stream functions into a package, and then streamline
> the process of creating the seed array.  My opinion is that CRAN is
> now overflowed with too many "one function" packages, I don't want to
> create another one just for these two little functions, and I may roll
> it into my general purpose regression package called "rockchalk".

I am also preparing a solution to the problem. One is based on AES
used for initialization of the R base Mersenne-Twister generator,
so it only replaces set.seed() function. Another solution is based
on "rlecuyer" package. I suggest to discuss the possible solutions
off-list before submitting to CRAN.

> One technical issue that has been raised to me is that R parallel's
> implementation of the L'Ecuyer generator is based on integer valued
> variables, whereas the original L'Ecuyer code uses double real
> variables.  But I'm trusting the R Core on this, if they code the
> generator in a way that is good enough for R itself, it is good enough
> for me. (And if that's wrong, we'll all find out together :) ).

I do not know about any L'Ecuyer's generator in R base. You probably
mean the authors of the extension packages with these generators.

> Josef Leydold (the rstream package author) has argued that R's
> implementation runs more slowly than it ought to. We had some
> correspondence and I tracked a few threads in forums. It appears the
> approach suggested there is roadblocked by some characteristics deep
> down in R and the way random streams are managed.  Packages have only
> a limited, tenuous method to replace R's generators with their own
> generators.

In order to connect a user defined generator to R, there are two
obligatory entry points "user_unif_rand" and "user_unif_init".
The first allows to call the generator from runif() and the similar
functions. The second connects the generator to set.seed() function.
If there is only one extension package with a generator loaded
to an R session, then these entry points are good enough. If the
package provides several generators, like "randtoolbox", it is
possible to change between them easily using functions provided
by the package for this purpose. I think that having several
packages with generators simultaneously can be good for their
development, but this is not needed for their use in applications.

There are also two other entry points "user_unif_nseed" and
"user_unif_seedloc", which allow to support the variable ".Random.seed".
A problem with this is that R converts the internal state of the
generator to ".Random.seed" by reading a specific memory location,
but does not alert the package about this event. So, if the state
requires a transformation to integer before storing to ".Random.seed",
it is not possible to do this only when needed.

In the package "rngwell19937", i included some code that tries to
determine, whether the user changed ".Random.seed" or not. The reason
is that most of the state is integer and is stored to ".Random.seed",
but the state contains also a function pointer, which is not stored.
It can be recomputed from ".Random.seed" and this recomputing is done,
if the package detects a change of ".Random.seed". This is not a nice
solution. So in "randtoolbox" we decided not to support ".Random.seed".

I understand that in the area of parallel computing, the work
with ".Random.seed" is a good paradigm, but if the generator
provides other tools for converting the state to an R object
and put it back to the active state, then ".Random.seed" is not
strictly necessary.

> Parallel Random Number Generation in C++ and R Using RngStream
> Andrew Karl · Randy Eubank · Dennis Young
> http://math.la.asu.edu/~eubank/webpage/rngStreamPaper.pdf

Thank you very much for this link.

All the best, Petr.

More information about the R-devel mailing list