[Rd] bias issue in sample() (PR 17494)

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Wed Feb 20 08:01:53 CET 2019


Luke,

I'm happy to help with this. Its great to see this get tackled (I've cc'ed
Kelli Ottoboni who helped flag this issue).

I can prepare a patch for the RNGkind related stuff and the doc update.

As for ???, what are your (and others') thoughts about the possibility of
a) a reproducibility API which takes either an R version (or maybe
alternatively a date) and sets the RNGkind to the default for that
version/date, and/or b) that sessionInfo be modified to capture (and
display) the RNGkind in effect.

Best,
~G


On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke <luke-tierney using uiowa.edu>
wrote:

>
> Before the next release we really should to sort out the bias issue in
> sample() reported by Ottoboni and Stark in
> https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
> filed aa a bug report by Duncan Murdoch at
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.
>
> Here are two examples of bad behavior through current R-devel:
>
>      set.seed(123)
>      m <- (2/5) * 2^32
>      x <- sample(m, 1000000, replace = TRUE)
>      table(x %% 2, x > m / 2)
>      ##
>      ##    FALSE   TRUE
>      ## 0 300620 198792
>      ## 1 200196 300392
>
>      table(sample(2/7 * 2^32, 1000000, replace = TRUE) %% 2)
>      ##
>      ##      0      1
>      ## 429054 570946
>
> I committed a modification to R_unif_index to address this by
> generating random bits (blocks of 16) and rejection sampling, but for
> now this is only enabled if the environment variable R_NEW_SAMPLE is
> set before the first call.
>
> Some things still needed:
>
> - someone to look over the change and see if there are any issues
> - adjustment of RNGkind to allowing the old behavior to be selected
> - make the new behavior the default
> - adjust documentation
> - ???
>
> Unfortunately I don't have enough free cycles to do this, but I can
> help if someone else can take the lead.
>
> There are two other places I found that might suffer from the same
> issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
> src/nmath/wilcox.c.  Both can be addressed by using R_unif_index. I
> have done that for walker_ProbSampleReplace, but the wilcox change
> might need adjusting to support the standalone math library and I
> don't feel confident enough I'd get that right.
>
> Best,
>
> luke
>
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list