[Rd] uniform sampling without replacement algorithm

William Dunlap wdunlap at tibco.com
Wed Oct 18 16:38:55 CEST 2017


Splus used a similar method for sampling from "bigdata" objects.  One
problem was that sample() is used both for creating a sample and for
scrambling the order of a vector.  Scrambling the order of a big vector
wastes time.  It would be nice to be able to tell sample() that we don't
care about the order.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Oct 17, 2017 at 10:55 AM, Pavel S. Ruzankin <ruzankin at math.nsc.ru>
wrote:

> Let us consider the current uniform sampling without replacement
> algorithm. It resides in function do_sample in
> https://svn.r-project.org/R/trunk/src/main/random.c
> Its complexity is obviously O(n), where the sample is selected from 1...n,
> since the algorithm has to create a vector of length n. So when the sample
> size is much lesser than n, the algorithm is not effective. Algorithms with
> average complexity O(s log s), were s is the sample size, were described
> long ago. E.g. see
> https://www.degruyter.com/view/j/mcma.1999.5.issue-1/mcma.
> 1999.5.1.39/mcma.1999.5.1.39.xml
> Here the Tree algorithm has complexity O(s log s). I suppose that there
> may be algorithms with complexity close to s. Is somebody planning to
> implement some more effective algorithm?
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list