[Rd] simulate in stats

Thu Sep 29 12:08:05 CEST 2005

>>>>> "PaulG" == Paul Gilbert <pgilbert at bank-banque-canada.ca>
>>>>>     on Thu, 15 Sep 2005 12:07:48 -0400 writes:

    PaulG> BTW, I think there is a problem with the way the
    PaulG> argument "seed" is used in the new simulate in stats.
    PaulG> The problem is that repeated calls to simulate using
    PaulG> the default argument will introduce a new pattern
    PaulG> into the RNG:

    >> stats:::simulate
    PaulG> function (object, nsim = 1, seed = as.integer(runif(1, 0, 
    PaulG> .Machine$integer.max)),   ...)
    PaulG> UseMethod("simulate")
    PaulG> <environment: namespace:stats>

    >> stats:::simulate.lm
    PaulG> function (object, nsim = 1, seed = as.integer(runif(1, 0, 
    PaulG> .Machine$integer.max)),    ...)
    PaulG> {
    PaulG> if (!exists(".Random.seed", envir = .GlobalEnv))
    PaulG> runif(1)
    PaulG> RNGstate <- .Random.seed
    PaulG> set.seed(seed)
    PaulG> ...

    PaulG> This should not be done, as the resulting RNG has not
    PaulG> been studied or proven. A better mechanism is to have
    PaulG> a default argument equal NULL, and not touch the seed
    PaulG> in that case.

I agree so far.  I think the default seed should really be NULL
(with your semantic!) rather than a random number.

    PaulG>  There are several examples of this in
    PaulG> the package dse1 (in bundle dse), see for example
    PaulG> simulate.ARMA and simulate.SS. They also use the
    PaulG> utilities in the setRNG package to save more of the
    PaulG> information necessary to reproduce
    PaulG> simulations. Roughly it is done like this:

    PaulG> simulate.x <- function (model, rng = NULL,  ...)
    PaulG>   {if (is.null(rng)) rng <- setRNG() 
    PaulG>     ## returns the RNG setting to be  saved with the result
    PaulG>   else {
    PaulG>     old.rng <- setRNG(rng)
    PaulG>     on.exit(setRNG(old.rng))
    PaulG>   }
    PaulG> ...

as nobody has further delved into this in the mean time,
this is definitely too late for R 2.2.0, even if it was desired.

But I also think you should be able to live with interpreting
'seed' as 'rng' if you want, shouldn't you?

    PaulG> The seed by itself is not very useful if the purpose
    PaulG> is to be able to reproduce things, and I think it
    PaulG> would be a good idea to incorporate the few small
    PaulG> functions setRNG into stats (especially if the
    PaulG> simulate mechanism is being introduced).

maybe we should reopen this topic {adopting ideas or even exact
implementations from your 'setRNG' into stats} in a few weeks,
when R 2.2.0 is released.

    PaulG> The argument "nsim" presumably alleviates to some
    PaulG> extent the above concern about changing the RNG
    PaulG> pattern. However, in my fairly extensive experience
    PaulG> it is not very workable to produce all the
    PaulG> simulations and then do the analysis of them. 
    PaulG> In a Monte Carlo experiment the generated data set is just
    PaulG> too big.

I believe this depends very much on the topic.  The simulate()
uses that we had envisaged with simulate() don't save all the
models and then analyze them.  
But maybe I'm misunderstanding your point completely here.

    PaulG>  A better approach is to do the analysis and save
    PaulG> only necessary information after each
    PaulG> simulation. That is the approach, for example, in
    PaulG> dse2:::EstEval.

    PaulG> Paul

    PaulG> Paul Gilbert wrote:

    >> Can the arguments nsim and seed be passed as part of ... in the new 
    >> simulate generic in R-2.2.0alpha package stats?

    >> This would potentially allow me to use the stats generic rather than 
    >> the one I define in dse. There are contexts where nsim and seed do not 
    >> make sense.

Well, the current specification for simulate() has been different
explicitly.

I agree that there are situations where both 'nsim' and 'seed'
(or a generalization, say 'RNGstate') wouldn't make sense and
one still would like to use something like "simulate" in the
function name. 

    >> I realize that the default arguments could be ignored, but 
    >> it does not really make sense to introduce a new generic with that in 
    >> mind. 

I think it would depend on the exaxt context if I would rather
use a (slightly) different function name, or just ignore the
ignorable arguments as you mention.

    >> (I would also prefer that the "object" argument was called 
    >>  "model" but this is less important.)

I'd personally agree with that;  the argument was that
'object' is very generally used in such situations.

Martin Maechler