[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

Martin Maechler maechler at stat.math.ethz.ch
Fri Nov 3 10:39:35 CET 2017


>>>>> Tirthankar Chakravarty <tirthankar.lists at gmail.com>
>>>>>     on Fri, 3 Nov 2017 13:19:12 +0530 writes:

    > This is cross-posted from SO
    > (https://stackoverflow.com/q/47079702/1414455), but I now
    > feel that this needs someone from R-Devel to help
    > understand why this is happening.

Why R-devel -- R-help would have been appropriate:

It seems you have not read the help page for
set.seed as I expect it from posters to R-devel. 
Why would you use strings instead of integers if you *had* read it ?

    > We are facing a weird situation in our code when using R's
    > [`runif`][1] and setting seed with `set.seed` with the
    > `kind = NULL` option (which resolves, unless I am
    > mistaken, to `kind = "default"`; the default being
    > `"Mersenne-Twister"`).

again this is not what the help page says; rather

 | The use of ‘kind = NULL’ or ‘normal.kind = NULL’ in ‘RNGkind’ or
 | ‘set.seed’ selects the currently-used generator (including that
 | used in the previous session if the workspace has been restored):
 | if no generator has been used it selects ‘"default"’.

but as you have > 90 (!!) packages in your sessionInfo() below,
why should we (or you) know if some of the things you did
before or (implicitly) during loading all these packages did not
change the RNG kind ?

    > We set the seed using (8 digit) unique IDs generated by an
    > upstream system, before calling `runif`:

    >     seeds = c( "86548915", "86551615", "86566163",
    > "86577411", "86584144", "86584272", "86620568",
    > "86724613", "86756002", "86768593", "86772411",
    > "86781516", "86794389", "86805854", "86814600",
    > "86835092", "86874179", "86876466", "86901193",
    > "86987847", "86988080")

    >  random_values = sapply(seeds, function(x) {
    >   set.seed(x)
    >   y = runif(1, 17, 26)
    >   return(y)
    > })

Why do you do that?

1) You should set the seed *once*, not multiple times in one simulation.

2) Assuming that your strings are correctly translated to integers
   and the same on all platforms, independent of locales (!) etc,
   you are again not following the simple instruction on the help page:

     ‘set.seed’ uses a single integer argument to set as many seeds as
     are required.  It is intended as a simple way to get quite
     different seeds by specifying small integer arguments, and also as
     .....
     .....

Note:   ** small ** integer 
Why do you assume   86901193  to be a small integer ?

    > This gives values that are **extremely** bunched together.

    >> summary(random_values)
    >        Min. 1st Qu.  Median Mean 3rd Qu.  Max.  25.13
    > 25.36 25.66 25.58 25.83 25.94

    > This behaviour of `runif` goes away when we use `kind =
    > "Knuth-TAOCP-2002"`, and we get values that appear to be
    > much more evenly spread out.

    >     random_values = sapply(seeds, function(x) {
    > set.seed(x, kind = "Knuth-TAOCP-2002") y = runif(1, 17,
    > 26) return(y) })

    > *Output omitted.*

    > ---

    > **The most interesting thing here is that this does not
    > happen on Windows -- only happens on Ubuntu**
    > (`sessionInfo` output for Ubuntu & Windows below).

    > # Windows output: #

    >> seeds = c(
    >     + "86548915", "86551615", "86566163", "86577411",
    > "86584144", + "86584272", "86620568", "86724613",
    > "86756002", "86768593", "86772411", + "86781516",
    > "86794389", "86805854", "86814600", "86835092",
    > "86874179", + "86876466", "86901193", "86987847",
    > "86988080")
    >> 
    >> random_values = sapply(seeds, function(x) {
    >     + set.seed(x) + y = runif(1, 17, 26) + return(y) + })
    >> 
    >> summary(random_values)
    >        Min. 1st Qu.  Median Mean 3rd Qu.  Max.  17.32
    > 20.14 23.00 22.17 24.07 25.90

    > Can someone help understand what is going on?

    > Ubuntu
    > ------

    > R version 3.4.0 (2017-04-21)
    > Platform: x86_64-pc-linux-gnu (64-bit)
    > Running under: Ubuntu 16.04.2 LTS

You have not learned to get a current version of R.
===> You should not write to R-devel (sorry if this may sound harsh ..)

Hint:
   We know that  Ubuntu LTS -- by its virtue of LTS (Long Time
   Support) will not update R.
   But the Ubuntu/Debian pages on CRAN tell you how to ensure to
   automatically get current versions of R on your ubuntu-run computer
   (Namely by adding a CRAN mirror to your ubuntu sources)

And then in your sessionInfo :

    ....
       38 packages attached + 56 namespaces loaded !!
    ....

   and similar nonsense (tons of packages+namespaces)
   on Windows which uses an even more outdated version of
   R 3.3.2.

-------------

Can you please learn to work with a minimal reproducible example MRE
(well you are close in your R code, but not if you load 50
 packages and do how-knows-what before running the example,
 you RNGkind() and many other things could have been changed ...)

Since you run ubuntu, you know the shell and you could
(after installing a current version of R) put your MRE in a
small *.R script and do

   R CMD BATCH --vanilla  MRE.R

which will produce MRE.Rout  with all input/output

BTW: Even on Windoze you can do similarly, once you've found the
location of 'Rcmd.exe':

   ......\Rcmd BATCH --vanilla MRE.R

should work there as well and deliver MRE.Rout

- - - - -
After doing all this, your problem may still be just
because you are using much too large integers for the 'seed'
argument of set.seed()

I really really strongly believe you should have used R-help
instead of R-devel.

Best,
Martin Maechler



More information about the R-devel mailing list