[R] sample(c(0, 1)...) vs. rbinom

peter dalgaard pdalgd at gmail.com
Thu May 23 07:24:49 CEST 2013


On May 23, 2013, at 07:01 , Jeff Newmiller wrote:

> You seem to be building an elaborate structure for testing the reproducibility of the random number generator. I suspect that rbinom is calling the random number generator a different number of times when you pass prob=0.5 than otherwise.

Nope. It's switching 0 and 1:

> set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); set.seed(1); rbinom(10,1,pp)
 [1] 1 1 0 0 1 0 0 0 0 1
 [1] 0 0 1 1 0 1 1 1 1 0

which is curious, but of course has no implication for the distributional properties. Curiouser, if you drop the prob= in sample.

> set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); rbinom(10,1,pp)
 [1] 0 0 1 1 0 1 1 1 1 0
 [1] 0 0 1 1 0 1 1 1 1 0

However, it was never a design goal that two different random functions (or even two code paths within the same function) should give exactly the same values, even if they simulate the same distribution, so this is nothing more than a curiosity.
 

>> 
>> Appendix A: some R code that exhibits the problem
>> =================================================
>> 
>> ppp <- seq(0, 1, by = 0.01)
>> 
>> result <- do.call(rbind, lapply(ppp, function(p) {
>>     set.seed(1)
>>     sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
>>                         prob=c(1-p, p))
>>     
>>     set.seed(1)
>>     rbinomRes <- rbinom(1, size = 1, prob = p)
>>     
>>     data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
>>     
>> }))
>> 
>> result
>> 
>> 
>> Appendix B: the output from the R code
>> ======================================
>> 
>>     prob equivalent
>> 1   0.00       TRUE
>> 2   0.01       TRUE
>> 3   0.02       TRUE
>> 4   0.03       TRUE
>> 5   0.04       TRUE
>> 6   0.05       TRUE
>> 7   0.06       TRUE
>> 8   0.07       TRUE
>> 9   0.08       TRUE
>> 10  0.09       TRUE
>> 11  0.10       TRUE
>> 12  0.11       TRUE
>> 13  0.12       TRUE
>> 14  0.13       TRUE
>> 15  0.14       TRUE
>> 16  0.15       TRUE
>> 17  0.16       TRUE
>> 18  0.17       TRUE
>> 19  0.18       TRUE
>> 20  0.19       TRUE
>> 21  0.20       TRUE
>> 22  0.21       TRUE
>> 23  0.22       TRUE
>> 24  0.23       TRUE
>> 25  0.24       TRUE
>> 26  0.25       TRUE
>> 27  0.26       TRUE
>> 28  0.27       TRUE
>> 29  0.28       TRUE
>> 30  0.29       TRUE
>> 31  0.30       TRUE
>> 32  0.31       TRUE
>> 33  0.32       TRUE
>> 34  0.33       TRUE
>> 35  0.34       TRUE
>> 36  0.35       TRUE
>> 37  0.36       TRUE
>> 38  0.37       TRUE
>> 39  0.38       TRUE
>> 40  0.39       TRUE
>> 41  0.40       TRUE
>> 42  0.41       TRUE
>> 43  0.42       TRUE
>> 44  0.43       TRUE
>> 45  0.44       TRUE
>> 46  0.45       TRUE
>> 47  0.46       TRUE
>> 48  0.47       TRUE
>> 49  0.48       TRUE
>> 50  0.49       TRUE
>> 51  0.50      FALSE
>> 52  0.51       TRUE
>> 53  0.52       TRUE
>> 54  0.53       TRUE
>> 55  0.54       TRUE
>> 56  0.55       TRUE
>> 57  0.56       TRUE
>> 58  0.57       TRUE
>> 59  0.58       TRUE
>> 60  0.59       TRUE
>> 61  0.60       TRUE
>> 62  0.61       TRUE
>> 63  0.62       TRUE
>> 64  0.63       TRUE
>> 65  0.64       TRUE
>> 66  0.65       TRUE
>> 67  0.66       TRUE
>> 68  0.67       TRUE
>> 69  0.68       TRUE
>> 70  0.69       TRUE
>> 71  0.70       TRUE
>> 72  0.71       TRUE
>> 73  0.72       TRUE
>> 74  0.73       TRUE
>> 75  0.74       TRUE
>> 76  0.75       TRUE
>> 77  0.76       TRUE
>> 78  0.77       TRUE
>> 79  0.78       TRUE
>> 80  0.79       TRUE
>> 81  0.80       TRUE
>> 82  0.81       TRUE
>> 83  0.82       TRUE
>> 84  0.83       TRUE
>> 85  0.84       TRUE
>> 86  0.85       TRUE
>> 87  0.86       TRUE
>> 88  0.87       TRUE
>> 89  0.88       TRUE
>> 90  0.89       TRUE
>> 91  0.90       TRUE
>> 92  0.91       TRUE
>> 93  0.92       TRUE
>> 94  0.93       TRUE
>> 95  0.94       TRUE
>> 96  0.95       TRUE
>> 97  0.96       TRUE
>> 98  0.97       TRUE
>> 99  0.98       TRUE
>> 100 0.99       TRUE
>> 101 1.00       TRUE
>> 
>> Appendix C: Session information
>> ===============================
>> 
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>> 
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
>>  [7] LC_PAPER=C                 LC_NAME=C                 
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
>> 
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base   
>>  
>>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list