[Rd] efficiency of sample() with prob.

Bo Peng ben.bob at gmail.com
Tue Aug 30 15:44:50 CEST 2005


> You chose to report just one extremely favourable example, < ignore>
> I do think you are being `careless with the truth'.

I chose to report whatever I got and whatever I felt the result was. It was not
a scientific report and it was up to you (the R-team) to validate my result and
make further investigations. When I was asked for a more thorough research
in this field (and thus take the responsibility to my results), I said
no since I did
not have enough expertise and time. 

After all, I could have chosen not to report anything at all. If this
mailing list
only accepts serious contributions from field professionals, it is time for me 
to quit.

> Hmm.  A sample every 100ns and generating samples several times faster
> than tabulating them is something to be impressed by. <ignore>
> Not in our tests.  Did you try my 5 out of 10,000 non-zero probablilties
> distribution, for example?

No. I did not try. Intuitively, Walker's method may be slow because  of
the memory allocation and table setup stuff. It should be used with 
large sample and small prob vector. Bisection method should be
quicker when the prob vector is large (so linear search is slow) and the 
sample size is relatively small. I thought bisection would be uniformly quicker 
than the linear search method since there is no additional cost other than 
a few more variables.

If you have done enough experiments and know the fastest method in 
each case, it should be straightforward to implement a hybrid method 
where the best method is chosen based on the lengths of sample and 
prob vector. I might have been reporting partial results, but the 80 times 
speed-up in some cases was not faked.

Cheers,
Bo



More information about the R-devel mailing list