[R] simulated data using empirical distribution

Daniel Lakeland dlakelan at street-artists.org
Wed Oct 10 16:44:57 CEST 2007


On Wed, 2007-10-10 at 10:13 -0400, Tom Sgouros wrote:
> Hello all:
> 
> I'm sure this is a trivial request, but I'm still a beginner at this,
> and haven't been able to find it.  I need to create simulated data based
> on some empirical distributions of a single variable.  I've found R
> functions to help me simulate data based on analytical distributions, or
> to make simulations based on correlation matrices, but nothing so simple
> as what I need.  What I have is twelve bins of data, and the population
> in each bin.  The top bin is open-ended, and the whole distribution is
> more or less poisson-ish.


if you have a bin with n items in it, you can generate n uniform random
numbers within the range of that bin to "reconstruct" your sample (I'm
assuming that you don't have a sample, just the histogram).

For the open ended bin, you could generate something like an
exponentially distributed random number with a shift to fit it into the
bin.

Now you'll have a sample which has a very similar distribution to your
histogram. You can generate bootstrap samples by simply resampling this
sample using replacement. You can also smooth these bootstrap samples by
sampling with replacement and then adding a small gaussian random noise
to each sample. such as

sample(mysample,size=100,replace=T) + rnorm(100,0,.01)

you may want to make the standard deviation of the normal smoothing
proportional to the size of your bins (perhaps 1/2 or 1/4 the width of
the bin).

Also, once you have a sample, you can fit a poisson distribution to your
sample and then use the fitted parameter to generate poisson random
numbers which may approximate your distribution well.



More information about the R-help mailing list