[Rd] algorithm reference for sample()
Martin Maechler
maechler at stat.math.ethz.ch
Fri Sep 24 10:02:53 CEST 2004
Hi Vadim,
>>>>> "Vadim" == Vadim Ogranovich <vograno at evafunds.com>
>>>>> on Thu, 23 Sep 2004 17:48:45 -0700 writes:
Vadim> Hi, Don't know if it belongs to r-devel or r-help,
Vadim> but since I am planning to alter some of R's internal code
i.e., you will propose a patch to the R sources eventually ?
Vadim> I am sending it here.
good choice. Also, since you are talking about
internal (non-API) C code from R - which I would deem
inappropriate on R-help.
Vadim> The existing implementation of the sample() function,
Vadim> when the optional 'prob' argument is given, is quite
Vadim> inefficient. The complexity is O(sampleSize *
Vadim> universeSize), see ProbSampleReplace() and
Vadim> ProbSampleNoReplace() in random.c. This makes the
Vadim> function impractical for the vector sizes I use.
I'm interested: What problem are you solving where sample() is
the bottleneck (rather than what you *do* with the sample ..)
Vadim> I want to re-code these functions and I "think" I can
Vadim> come up with a more efficient algorithm.
I agree. It's a kind of table lookup, that definitely can be
made faster e.g. by bisection ideas.
Vadim> However before I go and reinvent the wheel I wonder if there
Vadim> is a published description of an efficient sampling
Vadim> algorithm with user-specified probabilities?
I've got some ideas, but maybe would first want to get a reply to the
current ones.
Vadim> Thanks, Vadim
Vadim> [[alternative HTML version deleted]]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(you *did* read the posting guide or just the general
instructions on http://www.R-project.org/mail.html ?
)
Martin Maechler
More information about the R-devel
mailing list