[R] Distinct combinations for bootstrapping small sets

S Ellison S.Ellison at lgc.co.uk
Tue Mar 6 16:54:44 CET 2007


Small data sets (6-12 values, or a similarly small number of groups) which don't look nice and symmetric are quite common in my field (analytical chemistry and biological variants thereof), and often contain outliers or at least stragglers that I cannot simply discard. One of the things I occasionally do when I want to see what different assumptions do to my confidence intervals is to run a quick nonparametric bootstrap, just to get a feel for how asymmetric the distribution of any estimates might be. At the moment, I'm also interested in doing that on some historical data to evaluate some proposed estimators for interlab studies.

boot() is pretty good, but it's obvious that with such small sets, there aren't really many distinct resampled combinations (eg 92378 for 10 data points). So I'm really resampling from quite a small population of possible bootstrap samples. Its surely more efficient to generate all the different (resampled) combinations of the data set, and use those and their frequencies to get things like the bootstrap variance exactly. At worst, that'll stop us fooling ourselves into thinking more replicates will get better info.

A lengthy dig around R-help and CRAN turned up a blank on generating distinct combinations with resampling, so I've written a couple of routines to generate the distinct combinations and their frequencies. (They work, though I wouldn't guarantee great efficiency). But if a chemist (me) can think of it, its pretty certain that a statistician already has. Before I spend hours polishing code, is there already something out there I've missed?  

Steve Ellison



*******************************************************************
This email and any attachments are confidential. Any use, co...{{dropped}}



More information about the R-help mailing list