[R] sample(c(0, 1)...) vs. rbinom

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Thu May 23 07:01:19 CEST 2013


You seem to be building an elaborate structure for testing the reproducibility of the random number generator. I suspect that rbinom is calling the random number generator a different number of times when you pass prob=0.5 than otherwise.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Michael Hannon <jm_hannon at yahoo.com> wrote:

>Greetings.  My wife is teaching an introductory stat class at UC
>Davis.  The
>class emphasizes the use of simulations, rather than mathematics, to
>get
>insight into statistics, and R is the mandated tool.   A student in the
>class
>recently inquired about different approaches to sampling from a
>binomial
>distribution.  I've appended some code that exhibits the idea, the gist
>of
>which is that using sample(c(0, 1), ...) and rbinom(...) should give
>equivalent results.
>
>The surprising (to me) result is that the two approaches DO give the
>same
>result, EXCEPT when the probability is exactly 0.5.  See Appendix A for
>the
>code and Appendix B for the output.  I don't think this issue is
>system-dependent, but I've put my session information in Appendix C.
>
>Another wrinkle in this is that if I omit the "prob" parameter from the
>call
>to sample, meaning to take the default value of 0.5, the two methods DO
>give
>the same result.
>
>Any thoughts about this?  Thanks.
>
>--Mike
>
>Appendix A: some R code that exhibits the problem
>=================================================
>
>ppp <- seq(0, 1, by = 0.01)
>
>result <- do.call(rbind, lapply(ppp, function(p) {
>    set.seed(1)
>    sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
>                        prob=c(1-p, p))
>    
>    set.seed(1)
>    rbinomRes <- rbinom(1, size = 1, prob = p)
>    
>    data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
>    
>}))
>
>result
>
>
>Appendix B: the output from the R code
>======================================
>
>    prob equivalent
>1   0.00       TRUE
>2   0.01       TRUE
>3   0.02       TRUE
>4   0.03       TRUE
>5   0.04       TRUE
>6   0.05       TRUE
>7   0.06       TRUE
>8   0.07       TRUE
>9   0.08       TRUE
>10  0.09       TRUE
>11  0.10       TRUE
>12  0.11       TRUE
>13  0.12       TRUE
>14  0.13       TRUE
>15  0.14       TRUE
>16  0.15       TRUE
>17  0.16       TRUE
>18  0.17       TRUE
>19  0.18       TRUE
>20  0.19       TRUE
>21  0.20       TRUE
>22  0.21       TRUE
>23  0.22       TRUE
>24  0.23       TRUE
>25  0.24       TRUE
>26  0.25       TRUE
>27  0.26       TRUE
>28  0.27       TRUE
>29  0.28       TRUE
>30  0.29       TRUE
>31  0.30       TRUE
>32  0.31       TRUE
>33  0.32       TRUE
>34  0.33       TRUE
>35  0.34       TRUE
>36  0.35       TRUE
>37  0.36       TRUE
>38  0.37       TRUE
>39  0.38       TRUE
>40  0.39       TRUE
>41  0.40       TRUE
>42  0.41       TRUE
>43  0.42       TRUE
>44  0.43       TRUE
>45  0.44       TRUE
>46  0.45       TRUE
>47  0.46       TRUE
>48  0.47       TRUE
>49  0.48       TRUE
>50  0.49       TRUE
>51  0.50      FALSE
>52  0.51       TRUE
>53  0.52       TRUE
>54  0.53       TRUE
>55  0.54       TRUE
>56  0.55       TRUE
>57  0.56       TRUE
>58  0.57       TRUE
>59  0.58       TRUE
>60  0.59       TRUE
>61  0.60       TRUE
>62  0.61       TRUE
>63  0.62       TRUE
>64  0.63       TRUE
>65  0.64       TRUE
>66  0.65       TRUE
>67  0.66       TRUE
>68  0.67       TRUE
>69  0.68       TRUE
>70  0.69       TRUE
>71  0.70       TRUE
>72  0.71       TRUE
>73  0.72       TRUE
>74  0.73       TRUE
>75  0.74       TRUE
>76  0.75       TRUE
>77  0.76       TRUE
>78  0.77       TRUE
>79  0.78       TRUE
>80  0.79       TRUE
>81  0.80       TRUE
>82  0.81       TRUE
>83  0.82       TRUE
>84  0.83       TRUE
>85  0.84       TRUE
>86  0.85       TRUE
>87  0.86       TRUE
>88  0.87       TRUE
>89  0.88       TRUE
>90  0.89       TRUE
>91  0.90       TRUE
>92  0.91       TRUE
>93  0.92       TRUE
>94  0.93       TRUE
>95  0.94       TRUE
>96  0.95       TRUE
>97  0.96       TRUE
>98  0.97       TRUE
>99  0.98       TRUE
>100 0.99       TRUE
>101 1.00       TRUE
>
>Appendix C: Session information
>===============================
>
>> sessionInfo()
>R version 3.0.0 (2013-04-03)
>Platform: x86_64-redhat-linux-gnu (64-bit)
>
>locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
> [7] LC_PAPER=C                 LC_NAME=C                 
> [9] LC_ADDRESS=C               LC_TELEPHONE=C            
>[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
>
>attached base packages:
>[1] stats     graphics  grDevices utils     datasets  methods   base   
> 
>>
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list