[Rd] set.seed() in a package

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Wed Oct 30 17:50:18 CET 2019


> On 30/10/2019 9:08 a.m., peter dalgaard wrote:
> > You can fairly easily work around that by saving and restoring .Random.seed.

This is actually quite tedious to get correct; it requires you to
under how and when .Random.seed is set, and what are valid values on
.Random.seed.   For instance, a common mistake (me too) is to reset to
.GlobalEnv$.Random.seed <- NULL in a fresh R session but this will
produce a warning on: ".Random.seed' is not an integer vector but of
type 'NULL', so ignored".  You end up having to do things such as:

  oseed <- .GlobalEnv$.Random.seed
  on.exit({
    if (is.null(oseed)) {
      rm(list=".Random.seed", envir= .GlobalEnv)
    } else {
      assign(".Random.seed", value=oseed, envir= .GlobalEnv)
    }
  })

to avoid that warning.  So, having support functions for this in base
R would be helpful, e.g.

  oseed <- base::getRandomSeed()
  on.exit(base::setRandomSeed(oseed))

Back to Marvin's point/question.  I think it would be useful if the
CRAN Policies would explicitly say that functions must not change the
random seed to a fixed one, without undoing it, unless the user
specifies it via an argument.  If not, there's a great risk it will
mess up statistical analysis.  Also, if there would a way to test
against such practices, which I think is really hard, I would be the
first one backing it up.

On a related note; there are packages that forward the .Random.seed
when loaded.  This is also an unfortunate behavior because you will
give different RNGs depending on that package was already loaded
before you called a function that depends on it or not.  For example,
if pkgA forwards the .Random.seed when loaded, the following is *not*
reproducible:

# User might or might not have loaded pkgA already
if (runif(1) < 0.5) loadNamespace(pkgA)

set.seed(0xBEEF)
loadNamespace(pkgA)
y <- runif(1)

I ran into this problem when doing some strict testing.  I argue this
also falls into the set of "bad" practices to be avoided.

/Henrik

On Wed, Oct 30, 2019 at 6:23 AM Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>
> On 30/10/2019 9:08 a.m., peter dalgaard wrote:
> > We commit a similar sin in the help pages, e.g.
> >
> > example(set.seed) ; runif(2)
> > example(set.seed) ; runif(2)
> >
> > gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)
>
> I think it's pretty common in example code, and that's justifiable.  But
> it could be avoided by using withr::with_seed() or something equivalent.
>
> Duncan Murdoch
>
> >
> > You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons?
> >
> > -pd
> >
> >
> >> On 30 Oct 2019, at 13:46 , Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
> >>
> >> On 30/10/2019 3:28 a.m., Marvin Wright wrote:
> >>> Hi all,
> >>> I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem.
> >>> I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
> >>
> >> set.seed() writes .Random.seed in the user's global environment, which violates this policy:
> >>
> >> - Packages should not modify the global environment (user’s workspace).
> >>
> >> However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged.  And interplot() is documented to do random simulations, so it would be expected to change the seed:  the issue is that given the same inputs it always changes it to the same thing.  I think that would be quite hard for a test to detect.
> >>
> >> Should it be a policy with no test?  Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice.
> >>
> >> Duncan Murdoch
> >>
> >> ______________________________________________
> >> R-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list