[Rd] Bias in R's random integers?

Ralf Stubner r@lf@@tubner @ending from d@q@n@@com
Fri Sep 21 23:28:38 CEST 2018


On 9/21/18 6:38 PM, Tierney, Luke wrote:
> Not sure what should happen theoretically for the code in vseq.c, but
> I see the same pattern with the R generators I tried (default,
> Super-Duper, and L'Ecuyer) and with with bash $RANDOM using
> 
> N <- 10000
> X1 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern = TRUE)))
> X2 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern = TRUE)))
> X <- X1 + 2 ^ 15 * (X2 > 2^14)
> 
> and with numbers from random.org
> 
> library(random)
> X <- randomNumbers(N, 0, 2^16-1, col = 1)
> 
> So I'm not convinced there is an issue.
There is an issue, but it is in vseq.c.

The plot I found striking was this:

http://people.redhat.com/sgrubb/files/r-random.jpg

It shows a scatter plot that is bounded to some rectangle where the
upper right and lower left corner are empty. Roughly speaking, X and Y
correspond to *consecutive differences* between random draws. It is
obvious that differences between random draws are bounded by the range
of the RNG, and that there cannot be two *differences in a row* that are
close to the maximum (or minimum). Hence the expected shape for such a
scatter plot is a rectangle with two corners being forbidden.

Within the allowed region, there should be no structure what so ever
(given enough draws). And that was striking about the above picture: It
showed clear vertical bands which should not be there. MT does fail some
statistical tests, but it cannot be brought down that easily.

Interestingly, I first used Dirk's C++ function for convenience, and
that did *not* show these bands. But when I compiled vseq.c I could
reproduce this. To cut this short: There is an error in vseq.c when the
numbers are read in:

    tmp = strtoul(buf, NULL, 16);

The third argument to strtoul is the base in which the numbers should be
interpreted. However, R has written numbers with base 10. Those can be
interpreted as base 16, but they will mean something different. Once one
changes the above line to

    tmp = strtoul(buf, NULL, 10);

the bands do disappear.

cheerio
ralf

-- 
Ralf Stubner
Senior Software Engineer / Trainer

daqana GmbH
Dortustraße 48
14467 Potsdam

T: +49 331 23 61 93 11
F: +49 331 23 61 93 90
M: +49 162 20 91 196
Mail: ralf.stubner using daqana.com

Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr.: DE300072622
Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20180921/b4bc371f/attachment.sig>


More information about the R-devel mailing list