[R] unexpected behaviour of rnorm(): summary

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Fri Nov 29 01:46:42 CET 2002


Robin Hankin <r.hankin at auckland.ac.nz> writes:

> The original script was
> 
> >f <- function(n){max(rnorm(n))}
> > plot(sapply(rep(5000,4000),f))   #[this takes my PC about 30 seconds]
> 
> Received wisdom is that "The Marsaglia-Multicarry and Kinderman-Ramage
> options don't play nicely together in the extreme tails of the Normal
> distribution" (PD) but I discovered last night that

[Unless you mean Probability Distribution, I'd say that that was an odd
way of abbreviating Thomas Lumley. PBR for Brian D. Ripley below has been
seen before from software that thinks his first name is "Professor"....]
 
> > f.uppertail <- function(n,upper=3){x <- rnorm(n);x[x>upper]}
> > plot(f.uppertail(1e7))
> 
> shows the effect much more directly...and viewed this way, the default
> RNG would seem to be more seriously flawed.  The second script above
> considers the upper (ie Z>3) tail of the Gaussian: and it has gaps.
> This is borderline "extreme tails" because pnorm(3) is only about
> 0.998.

> In contrast, Bug #1664 refers to the maximal value of a series of
> rnorm() values which, as PBR points out, _is_ a rather severe test of
> any RNG; the difference between f.uppertail() and #1664 would be that
> #1664 simulates
> 
> > dmaxnorm <- function(x,n){n* dnorm(x)*pnorm(x)^(n-1)}

> which involves PDFs and CDFs; f.uppertail() is solely a rnorm() issue.
> If nothing else, f.uppertail() is a simpler test for rnorm().

Actually, it's not all that different, and it is much easier to see if
you think in terms of non-exceedance probabilities:

  P(max X < x) = P(X < x)^n

so taking max just rescales the p axis on a plot of the CDF by taking
p^n, which will naturally emphasize the upper tail. However if the
distribution of X has a gap at x so will the distribution of the
maximum.

It *is* a bit uncomfortable. Probably someone ought to go in and
investigate exactly what the nature of the interaction between the two
methods amounts to, and maybe we should consider changing the default
eventually (although as Brian said there are back-compatibility issues
that make this rather less attractive).
 
-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list