[R] sample(c(0, 1)...) vs. rbinom

Albyn Jones jones at reed.edu
Thu May 23 17:29:55 CEST 2013


After a bit of playing around, I discovered that
sample() does something similar in other situations:

> set.seed(105021)
> sample(1:5,1,prob=c(1,1,1,1,1))
[1] 3
> set.seed(105021)
> sample(1:5,1)
[1] 2


> set.seed(105021)
> sample(1:5,5,prob=c(1,1,1,1,1))
[1] 3 4 2 1 5
> set.seed(105021)
> sample(1:5,5)
[1] 2 5 1 4 3

albyn


On 2013-05-22 22:24, peter dalgaard wrote:
> On May 23, 2013, at 07:01 , Jeff Newmiller wrote:
>
>> You seem to be building an elaborate structure for testing the 
>> reproducibility of the random number generator. I suspect that rbinom 
>> is calling the random number generator a different number of times 
>> when you pass prob=0.5 than otherwise.
>
> Nope. It's switching 0 and 1:
>
>> set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); 
>> set.seed(1); rbinom(10,1,pp)
>  [1] 1 1 0 0 1 0 0 0 0 1
>  [1] 0 0 1 1 0 1 1 1 1 0
>
> which is curious, but of course has no implication for the
> distributional properties. Curiouser, if you drop the prob= in 
> sample.
>
>> set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); 
>> rbinom(10,1,pp)
>  [1] 0 0 1 1 0 1 1 1 1 0
>  [1] 0 0 1 1 0 1 1 1 1 0
>
> However, it was never a design goal that two different random
> functions (or even two code paths within the same function) should
> give exactly the same values, even if they simulate the same
> distribution, so this is nothing more than a curiosity.
>
>
>>>
>>> Appendix A: some R code that exhibits the problem
>>> =================================================
>>>
>>> ppp <- seq(0, 1, by = 0.01)
>>>
>>> result <- do.call(rbind, lapply(ppp, function(p) {
>>>     set.seed(1)
>>>     sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
>>>                         prob=c(1-p, p))
>>>
>>>     set.seed(1)
>>>     rbinomRes <- rbinom(1, size = 1, prob = p)
>>>
>>>     data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
>>>
>>> }))
>>>
>>> result
>>>
>>>
>>> Appendix B: the output from the R code
>>> ======================================
>>>
>>>     prob equivalent
>>> 1   0.00       TRUE
>>> 2   0.01       TRUE
>>> 3   0.02       TRUE
>>> 4   0.03       TRUE
>>> 5   0.04       TRUE
>>> 6   0.05       TRUE
>>> 7   0.06       TRUE
>>> 8   0.07       TRUE
>>> 9   0.08       TRUE
>>> 10  0.09       TRUE
>>> 11  0.10       TRUE
>>> 12  0.11       TRUE
>>> 13  0.12       TRUE
>>> 14  0.13       TRUE
>>> 15  0.14       TRUE
>>> 16  0.15       TRUE
>>> 17  0.16       TRUE
>>> 18  0.17       TRUE
>>> 19  0.18       TRUE
>>> 20  0.19       TRUE
>>> 21  0.20       TRUE
>>> 22  0.21       TRUE
>>> 23  0.22       TRUE
>>> 24  0.23       TRUE
>>> 25  0.24       TRUE
>>> 26  0.25       TRUE
>>> 27  0.26       TRUE
>>> 28  0.27       TRUE
>>> 29  0.28       TRUE
>>> 30  0.29       TRUE
>>> 31  0.30       TRUE
>>> 32  0.31       TRUE
>>> 33  0.32       TRUE
>>> 34  0.33       TRUE
>>> 35  0.34       TRUE
>>> 36  0.35       TRUE
>>> 37  0.36       TRUE
>>> 38  0.37       TRUE
>>> 39  0.38       TRUE
>>> 40  0.39       TRUE
>>> 41  0.40       TRUE
>>> 42  0.41       TRUE
>>> 43  0.42       TRUE
>>> 44  0.43       TRUE
>>> 45  0.44       TRUE
>>> 46  0.45       TRUE
>>> 47  0.46       TRUE
>>> 48  0.47       TRUE
>>> 49  0.48       TRUE
>>> 50  0.49       TRUE
>>> 51  0.50      FALSE
>>> 52  0.51       TRUE
>>> 53  0.52       TRUE
>>> 54  0.53       TRUE
>>> 55  0.54       TRUE
>>> 56  0.55       TRUE
>>> 57  0.56       TRUE
>>> 58  0.57       TRUE
>>> 59  0.58       TRUE
>>> 60  0.59       TRUE
>>> 61  0.60       TRUE
>>> 62  0.61       TRUE
>>> 63  0.62       TRUE
>>> 64  0.63       TRUE
>>> 65  0.64       TRUE
>>> 66  0.65       TRUE
>>> 67  0.66       TRUE
>>> 68  0.67       TRUE
>>> 69  0.68       TRUE
>>> 70  0.69       TRUE
>>> 71  0.70       TRUE
>>> 72  0.71       TRUE
>>> 73  0.72       TRUE
>>> 74  0.73       TRUE
>>> 75  0.74       TRUE
>>> 76  0.75       TRUE
>>> 77  0.76       TRUE
>>> 78  0.77       TRUE
>>> 79  0.78       TRUE
>>> 80  0.79       TRUE
>>> 81  0.80       TRUE
>>> 82  0.81       TRUE
>>> 83  0.82       TRUE
>>> 84  0.83       TRUE
>>> 85  0.84       TRUE
>>> 86  0.85       TRUE
>>> 87  0.86       TRUE
>>> 88  0.87       TRUE
>>> 89  0.88       TRUE
>>> 90  0.89       TRUE
>>> 91  0.90       TRUE
>>> 92  0.91       TRUE
>>> 93  0.92       TRUE
>>> 94  0.93       TRUE
>>> 95  0.94       TRUE
>>> 96  0.95       TRUE
>>> 97  0.96       TRUE
>>> 98  0.97       TRUE
>>> 99  0.98       TRUE
>>> 100 0.99       TRUE
>>> 101 1.00       TRUE
>>>
>>> Appendix C: Session information
>>> ===============================
>>>
>>>> sessionInfo()
>>> R version 3.0.0 (2013-04-03)
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>  [7] LC_PAPER=C                 LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   
>>> base
>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list