[R] sample(c(0, 1)...) vs. rbinom

Michael Hannon jm_hannon at yahoo.com
Thu May 23 04:54:01 CEST 2013


Greetings.  My wife is teaching an introductory stat class at UC Davis.  The
class emphasizes the use of simulations, rather than mathematics, to get
insight into statistics, and R is the mandated tool.   A student in the class
recently inquired about different approaches to sampling from a binomial
distribution.  I've appended some code that exhibits the idea, the gist of
which is that using sample(c(0, 1), ...) and rbinom(...) should give
equivalent results.

The surprising (to me) result is that the two approaches DO give the same
result, EXCEPT when the probability is exactly 0.5.  See Appendix A for the
code and Appendix B for the output.  I don't think this issue is
system-dependent, but I've put my session information in Appendix C.

Another wrinkle in this is that if I omit the "prob" parameter from the call
to sample, meaning to take the default value of 0.5, the two methods DO give
the same result.

Any thoughts about this?  Thanks.

--Mike

Appendix A: some R code that exhibits the problem
=================================================

ppp <- seq(0, 1, by = 0.01)

result <- do.call(rbind, lapply(ppp, function(p) {
    set.seed(1)
    sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
                        prob=c(1-p, p))
    
    set.seed(1)
    rbinomRes <- rbinom(1, size = 1, prob = p)
    
    data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
    
}))

result


Appendix B: the output from the R code
======================================

    prob equivalent
1   0.00       TRUE
2   0.01       TRUE
3   0.02       TRUE
4   0.03       TRUE
5   0.04       TRUE
6   0.05       TRUE
7   0.06       TRUE
8   0.07       TRUE
9   0.08       TRUE
10  0.09       TRUE
11  0.10       TRUE
12  0.11       TRUE
13  0.12       TRUE
14  0.13       TRUE
15  0.14       TRUE
16  0.15       TRUE
17  0.16       TRUE
18  0.17       TRUE
19  0.18       TRUE
20  0.19       TRUE
21  0.20       TRUE
22  0.21       TRUE
23  0.22       TRUE
24  0.23       TRUE
25  0.24       TRUE
26  0.25       TRUE
27  0.26       TRUE
28  0.27       TRUE
29  0.28       TRUE
30  0.29       TRUE
31  0.30       TRUE
32  0.31       TRUE
33  0.32       TRUE
34  0.33       TRUE
35  0.34       TRUE
36  0.35       TRUE
37  0.36       TRUE
38  0.37       TRUE
39  0.38       TRUE
40  0.39       TRUE
41  0.40       TRUE
42  0.41       TRUE
43  0.42       TRUE
44  0.43       TRUE
45  0.44       TRUE
46  0.45       TRUE
47  0.46       TRUE
48  0.47       TRUE
49  0.48       TRUE
50  0.49       TRUE
51  0.50      FALSE
52  0.51       TRUE
53  0.52       TRUE
54  0.53       TRUE
55  0.54       TRUE
56  0.55       TRUE
57  0.56       TRUE
58  0.57       TRUE
59  0.58       TRUE
60  0.59       TRUE
61  0.60       TRUE
62  0.61       TRUE
63  0.62       TRUE
64  0.63       TRUE
65  0.64       TRUE
66  0.65       TRUE
67  0.66       TRUE
68  0.67       TRUE
69  0.68       TRUE
70  0.69       TRUE
71  0.70       TRUE
72  0.71       TRUE
73  0.72       TRUE
74  0.73       TRUE
75  0.74       TRUE
76  0.75       TRUE
77  0.76       TRUE
78  0.77       TRUE
79  0.78       TRUE
80  0.79       TRUE
81  0.80       TRUE
82  0.81       TRUE
83  0.82       TRUE
84  0.83       TRUE
85  0.84       TRUE
86  0.85       TRUE
87  0.86       TRUE
88  0.87       TRUE
89  0.88       TRUE
90  0.89       TRUE
91  0.90       TRUE
92  0.91       TRUE
93  0.92       TRUE
94  0.93       TRUE
95  0.94       TRUE
96  0.95       TRUE
97  0.96       TRUE
98  0.97       TRUE
99  0.98       TRUE
100 0.99       TRUE
101 1.00       TRUE

Appendix C: Session information
===============================

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
>



More information about the R-help mailing list