# [R] sample(c(0, 1)...) vs. rbinom

Albyn Jones jones at reed.edu
Thu May 23 17:29:55 CEST 2013

```After a bit of playing around, I discovered that
sample() does something similar in other situations:

> set.seed(105021)
> sample(1:5,1,prob=c(1,1,1,1,1))
[1] 3
> set.seed(105021)
> sample(1:5,1)
[1] 2

> set.seed(105021)
> sample(1:5,5,prob=c(1,1,1,1,1))
[1] 3 4 2 1 5
> set.seed(105021)
> sample(1:5,5)
[1] 2 5 1 4 3

albyn

On 2013-05-22 22:24, peter dalgaard wrote:
> On May 23, 2013, at 07:01 , Jeff Newmiller wrote:
>
>> You seem to be building an elaborate structure for testing the
>> reproducibility of the random number generator. I suspect that rbinom
>> is calling the random number generator a different number of times
>> when you pass prob=0.5 than otherwise.
>
> Nope. It's switching 0 and 1:
>
>> set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp));
>> set.seed(1); rbinom(10,1,pp)
>  [1] 1 1 0 0 1 0 0 0 0 1
>  [1] 0 0 1 1 0 1 1 1 1 0
>
> which is curious, but of course has no implication for the
> distributional properties. Curiouser, if you drop the prob= in
> sample.
>
>> set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1);
>> rbinom(10,1,pp)
>  [1] 0 0 1 1 0 1 1 1 1 0
>  [1] 0 0 1 1 0 1 1 1 1 0
>
> However, it was never a design goal that two different random
> functions (or even two code paths within the same function) should
> give exactly the same values, even if they simulate the same
> distribution, so this is nothing more than a curiosity.
>
>
>>>
>>> Appendix A: some R code that exhibits the problem
>>> =================================================
>>>
>>> ppp <- seq(0, 1, by = 0.01)
>>>
>>> result <- do.call(rbind, lapply(ppp, function(p) {
>>>     set.seed(1)
>>>     sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
>>>                         prob=c(1-p, p))
>>>
>>>     set.seed(1)
>>>     rbinomRes <- rbinom(1, size = 1, prob = p)
>>>
>>>     data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
>>>
>>> }))
>>>
>>> result
>>>
>>>
>>> Appendix B: the output from the R code
>>> ======================================
>>>
>>>     prob equivalent
>>> 1   0.00       TRUE
>>> 2   0.01       TRUE
>>> 51  0.50      FALSE
>>>
>>> Appendix C: Session information
>>> ===============================
>>>
>>>> sessionInfo()
>>> R version 3.0.0 (2013-04-03)
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>  [7] LC_PAPER=C                 LC_NAME=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods
>>> base
>>>
>>>>
>>>
