[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

Tirthankar Chakravarty tirthankar.lists at gmail.com
Fri Nov 3 08:49:12 CET 2017


This is cross-posted from SO (https://stackoverflow.com/q/47079702/1414455),
but I now feel that this needs someone from R-Devel to help understand why
this is happening.

We are facing a weird situation in our code when using R's [`runif`][1] and
setting seed with `set.seed` with the `kind = NULL` option (which resolves,
unless I am mistaken, to `kind = "default"`; the default being
`"Mersenne-Twister"`).

We set the seed using (8 digit) unique IDs generated by an upstream system,
before calling `runif`:

    seeds = c(
      "86548915", "86551615", "86566163", "86577411", "86584144",
      "86584272", "86620568", "86724613", "86756002", "86768593",
"86772411",
      "86781516", "86794389", "86805854", "86814600", "86835092",
"86874179",
      "86876466", "86901193", "86987847", "86988080")

    random_values = sapply(seeds, function(x) {
      set.seed(x)
      y = runif(1, 17, 26)
      return(y)
    })

This gives values that are **extremely** bunched together.

    > summary(random_values)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
      25.13   25.36   25.66   25.58   25.83   25.94

This behaviour of `runif` goes away when we use `kind =
"Knuth-TAOCP-2002"`, and we get values that appear to be much more evenly
spread out.

    random_values = sapply(seeds, function(x) {
      set.seed(x, kind = "Knuth-TAOCP-2002")
      y = runif(1, 17, 26)
      return(y)
    })

*Output omitted.*

---

**The most interesting thing here is that this does not happen on Windows
-- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
below).

# Windows output: #

    > seeds = c(
    +   "86548915", "86551615", "86566163", "86577411", "86584144",
    +   "86584272", "86620568", "86724613", "86756002", "86768593",
"86772411",
    +   "86781516", "86794389", "86805854", "86814600", "86835092",
"86874179",
    +   "86876466", "86901193", "86987847", "86988080")
    >
    > random_values = sapply(seeds, function(x) {
    +   set.seed(x)
    +   y = runif(1, 17, 26)
    +   return(y)
    + })
    >
    > summary(random_values)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
      17.32   20.14   23.00   22.17   24.07   25.90

Can someone help understand what is going on?

Ubuntu
------

    R version 3.4.0 (2017-04-21)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 16.04.2 LTS

    Matrix products: default
    BLAS: /usr/lib/libblas/libblas.so.3.6.0
    LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

    locale:
    [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
     [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
     [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
     [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
     [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
    [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

    attached base packages:
    [1] parallel  stats     graphics  grDevices utils     datasets
methods   base

    other attached packages:
    [1] RMySQL_0.10.8               DBI_0.6-1
     [3] jsonlite_1.4                tidyjson_0.2.2
     [5] optiRum_0.37.3              lubridate_1.6.0
     [7] httr_1.2.1                  gdata_2.18.0
     [9] XLConnect_0.2-12            XLConnectJars_0.2-12
    [11] data.table_1.10.4           stringr_1.2.0
    [13] readxl_1.0.0                xlsx_0.5.7
    [15] xlsxjars_0.6.1              rJava_0.9-8
    [17] sqldf_0.4-10                RSQLite_1.1-2
    [19] gsubfn_0.6-6                proto_1.0.0
    [21] dplyr_0.5.0                 purrr_0.2.4
    [23] readr_1.1.1                 tidyr_0.6.3
    [25] tibble_1.3.0                tidyverse_1.1.1
    [27] rBayesianOptimization_1.1.0 xgboost_0.6-4
    [29] MLmetrics_1.1.1             caret_6.0-76
    [31] ROCR_1.0-7                  gplots_3.0.1
    [33] effects_3.1-2               pROC_1.10.0
    [35] pscl_1.4.9                  lattice_0.20-35
    [37] MASS_7.3-47                 ggplot2_2.2.1

    loaded via a namespace (and not attached):
    [1] splines_3.4.0      foreach_1.4.3      AUC_0.3.0
modelr_0.1.0
     [5] gtools_3.5.0       assertthat_0.2.0   stats4_3.4.0
 cellranger_1.1.0
     [9] quantreg_5.33      chron_2.3-50       digest_0.6.10
rvest_0.3.2
    [13] minqa_1.2.4        colorspace_1.3-2   Matrix_1.2-10
plyr_1.8.4
    [17] psych_1.7.3.21     XML_3.98-1.7       broom_0.4.2
SparseM_1.77
    [21] haven_1.0.0        scales_0.4.1       lme4_1.1-13
MatrixModels_0.4-1
    [25] mgcv_1.8-17        car_2.1-5          nnet_7.3-12
lazyeval_0.2.0
    [29] pbkrtest_0.4-7     mnormt_1.5-5       magrittr_1.5
 memoise_1.0.0
    [33] nlme_3.1-131       forcats_0.2.0      xml2_1.1.1
 foreign_0.8-69
    [37] tools_3.4.0        hms_0.3            munsell_0.4.3
compiler_3.4.0
    [41] caTools_1.17.1     rlang_0.1.1        grid_3.4.0
 nloptr_1.0.4
    [45] iterators_1.0.8    bitops_1.0-6       tcltk_3.4.0
gtable_0.2.0
    [49] ModelMetrics_1.1.0 codetools_0.2-15   reshape2_1.4.2     R6_2.2.0

    [53] knitr_1.15.1       KernSmooth_2.23-15 stringi_1.1.5
Rcpp_0.12.11



Windows
-------

    > sessionInfo()
    R version 3.3.2 (2016-10-31)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows >= 8 x64 (build 9200)

    locale:
    [1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
LC_MONETARY=English_India.1252
    [4] LC_NUMERIC=C                   LC_TIME=English_India.1252

    attached base packages:
    [1] graphics  grDevices utils     datasets  grid      stats
 methods   base

    other attached packages:
     [1] bindrcpp_0.2         h2o_3.14.0.3         ggrepel_0.6.5
eulerr_1.1.0         VennDiagram_1.6.17
     [6] futile.logger_1.4.3  scales_0.4.1         FinCal_0.6.3
 xml2_1.0.0           httr_1.3.0
    [11] wesanderson_0.3.2    wordcloud_2.5        RColorBrewer_1.1-2
 htmltools_0.3.6      urltools_1.6.0
    [16] timevis_0.4          dtplyr_0.0.1         magrittr_1.5
 shiny_1.0.5          RODBC_1.3-14
    [21] zoo_1.8-0            sqldf_0.4-10         RSQLite_1.1-2
gsubfn_0.6-6         proto_1.0.0
    [26] gdata_2.17.0         stringr_1.2.0        XLConnect_0.2-12
 XLConnectJars_0.2-12 data.table_1.10.4
    [31] xlsx_0.5.7           xlsxjars_0.6.1       rJava_0.9-8
readxl_0.1.1         googlesheets_0.2.1
    [36] jsonlite_1.5         tidyjson_0.2.1       RMySQL_0.10.9
RPostgreSQL_0.4-1    DBI_0.5-1
    [41] dplyr_0.7.2          purrr_0.2.3          readr_1.1.1
tidyr_0.7.0          tibble_1.3.3
    [46] ggplot2_2.2.0        tidyverse_1.0.0      lubridate_1.6.0

    loaded via a namespace (and not attached):
     [1] gtools_3.5.0         assertthat_0.2.0     triebeard_0.3.0
cellranger_1.1.0     yaml_2.1.14
     [6] slam_0.1-40          lattice_0.20-34      glue_1.1.1
 chron_2.3-48         digest_0.6.12.1
    [11] colorspace_1.3-1     httpuv_1.3.5         plyr_1.8.4
 pkgconfig_2.0.1      xtable_1.8-2
    [16] lazyeval_0.2.0       mime_0.5             memoise_1.0.0
tools_3.3.2          hms_0.3
    [21] munsell_0.4.3        lambda.r_1.1.9       rlang_0.1.1
RCurl_1.95-4.8       labeling_0.3
    [26] bitops_1.0-6         tcltk_3.3.2          gtable_0.2.0
 reshape2_1.4.2       R6_2.2.0
    [31] bindr_0.1            futile.options_1.0.0 stringi_1.1.2
Rcpp_0.12.12.1

  [1]: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/Uniform.html

	[[alternative HTML version deleted]]



More information about the R-devel mailing list