[Rd] efficiency of sample() with prob.

Martin Maechler maechler at stat.math.ethz.ch
Fri Jun 24 19:02:58 CEST 2005

>>>>> "Bo" == Bo Peng <ben.bob at gmail.com>
>>>>>     on Fri, 24 Jun 2005 10:32:45 -0500 writes:

    Bo> On 6/24/05, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
    >> `Research' involves looking at all the competitor methods, devising a
    >> near-optimal strategy and selecting amongst methods according to that
    >> strategy.  It is not a quick fix we are looking for but something that
    >> will be good for the long term.

    Bo> I am sorry but I am afraid that I do not have enough time and
    Bo> background knowledge
    Bo> to do a thorough research in this area.

which I think is well understandable.

    Bo> I have tried bisection search method and the alias
    Bo> method, the latter has greatly improved the performance
    Bo> of my bioinformatics application. Since this method is
    Bo> the only one mentioned in Knuth's book, I have no idea
    Bo> about other alternatives.

I think you've also explored the space of possible inputs a bit
and have suggested that the alias method was "uniformly" better
than the current one, i.e. always better, sometimes only
slightly but sometimes considerably (and never worse).  
If this (uniform improvement) can be ``proven'' in some way,
{and that maybe a considerable "if", I haven't started to go in there}
and because the algorithm is relatively simple {i.e., there's
not much code added to the current one},
I'd think that we (R-core) should incorporate the algorithm for
the time being, until someone has time for the ``real research''
and provide even better algorithm(s).
I don't see why the phrase 
   "the good is the enemy of the better" should apply in this

Martin Maechler, ETH Zurich

    Bo> Attached is a slightly improved version of the alias method.

(deleted for this reply).

    Bo> It may be helpful to people having similar problems.

    Bo> Thanks.

    Bo> --
    Bo> Bo Peng
    Bo> Department of Statistics
    Bo> Rice University.
    Bo> http://bp6.stat.rice.edu:8080/

More information about the R-devel mailing list