[R] Needed: Understading runif() output :-)

Prof Brian D Ripley ripley at stats.ox.ac.uk
Thu May 25 17:09:20 CEST 2000


On Thu, 25 May 2000, Kjetil Kjernsmo wrote:

> Dear all,
> 
> I have been trying to understand what runif() is telling me. 
> I am generating lots of numbers (billions and billions (wow, I've dreamed
> about saying that for many years... :-) ), for a distribution that has the
> following quantile function:
>     1 / (2 * sqrt(1 - p)) 
> (that is, the distribution has a lower cutoff)
> As you can imagine, this has rather heavy upper tail. I was looking at the
> largest values, and it looked as if the largest values appeared again and
> again. Now, it wasn't in itself that large values were strange, since I'm
> generating so many numbers, but that the largest were very much larger
> than the second largest numbers, and that exactly the same number appeared
> again and again. First I thought it was a bug, and I'm sorry to have
> wasted r-devels time with a bug report. 
> 
> I started running the same simulation with different RNGs and they all
> seem to generate numbers in "quantized states". Then, I started to look
> into what runif() gives, and let it print 13 digits. 
> In the below output, I use the "Mersenne-Twister" RNG and I have generated
> 1e+10 numbers (100000 at a time) and I print a line if it the number is
> above 10000 (my dist, the left coloumn), the right coloumn are runif()
> the corresponding outputs.
> [1] 3.276800000000e+04 9.999999997672e-01
> [1] 13377.479981919865     0.999999998603
> [1] 1.158523750296e+04 9.999999981374e-01
> [1] 1.036215143684e+04 9.999999976717e-01
> [1] 1.158523750296e+04 9.999999981374e-01
> [1] 1.036215143684e+04 9.999999976717e-01
> [1] 1.036215143684e+04 9.999999976717e-01
> [1] 13377.479981919865     0.999999998603
> 
> So, it seems that the runif() outputs are "quantized" too. The question 
> is: What is the reason for this?
> I have been playing with the tought that it may be connected to the finite 
> number representation capabilites of a computer? As I said, all the RNGs
> seems to have similar characteristics.

Yes, they are all quantized (all to ca 2^-31 or 2^-32).  Reason: that's
all we can assume for integer arithmetic. That is perfectly
sufficient for a *stable* procedure for using them. As I said before, I
don't think the algorithm you have for making use of them is adequate.  
Now, you haven't actually told us how you are generating numbers from your
distribution (and you haven't actually defined the distribution precisely
enough so that I could program it), but I guess you are using inversion.  
Don't: it is not adequate for your purposes.  You want to make use of
several random numbers if you want behaviour in the far tail.
Alternatively, you could plug in a generator that had a lower quantization.

The moral is a familiar one: computer results are almost always
approximate, and you always have to watch out for the effects of the
approximations.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list