[Rd] efficiency of sample() with prob.

Wed Jun 22 22:35:03 CEST 2005

You might recall the message

     R is a collaborative project with many contributors.

We suggest that you take up your own suggestion, research this area and 
offer the R project the results of your research for consideration as your 
contribution.

It seems likely that in sample(x, size, replace = TRUE, prob) the optimal 
method depends on both the size of 'x' and 'size' and perhaps to a lesser 
extent on 'prob'.  (That's what my book on the subject shows.)

On Wed, 22 Jun 2005, Bo Peng wrote:

> On 6/21/05, Vadim Ogranovich <vograno at evafunds.com> wrote:
>> In his "Introduction to Probability Models" Sheldon Ross describes (sec
>> 11.4.1, 8th edition) the alias method for such weighted sampling.
>> It is based on some decomposition of the original distribution (the
>> weights) into a mixture of two-point distributions.
>
> This sounds like Walker's alias method for weighted sampling. I looked
> through Knoth's 'the art of computer programming' and find this
> algorithm. I implemented this one but it is just as efficient as the
> bisection lookup method in my case. The reason is that the setup of
> this algorithm is complicated so it is suited for getting large sample
> from short weighted sequences. Anyway, I do suggest R developers try
> this algorithm for sample with replacement. A sample code can be found
> at http://statistik.wu-wien.ac.at/arvag/monograph/arvag-src/algo03_03.c
> .
>
> BTW, does anyone know a quicker algorithm to set up the internal table
> of the alias method?

Quicker than what?  See the discussion in my Stochastic Simulation book 
for `quicker than Walker'.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595