[R] Efficient sampling from a discrete distribution in R

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Sep 4 07:18:04 CEST 2007


On Mon, 3 Sep 2007, Issac Trotts wrote:

> Hello r-help,
>
> As far as I've seen, there is no function in R dedicated to sampling
> from a discrete distribution with a specified mass function.  The
> standard library doesn't come with anything called rdiscrete or rpmf,
> and I can't find any such thing on the cheat sheet or in the
> Probability Distributions chapter of _An Introduction to R_.  Googling
> also didn't bring back anything.  So, here's my first attempt at a
> solution.  I'm hoping someone here knows of a more efficient way.

It's called sample().

There are much more efficient algorithms than the one you used, and 
sample() sometimes uses one of them (Walker's alias method): see any good 
book on simulation (including my 'Stochastic Simulation, 1987).

> # Sample from a discrete distribution with given probability mass function
> rdiscrete = function(size, pmf) {
>  stopifnot(length(pmf) > 1)
>  cmf = cumsum(pmf)
>  icmf = function(p) {
>    min(which(p < cmf))
>  }
>  ps = runif(size)
>  sapply(ps, icmf)
> }
>
> test.rdiscrete = function(N = 10000) {
>  err.tol = 6.0 / sqrt(N)
>  xs = rdiscrete(N, c(0.5, 0.5))
>  err = abs(sum(xs == 1) / N - 0.5)
>  stopifnot(err < err.tol)
>  list(e = err, xs = xs)
> }
>
> Thanks,
> Issac
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list