[R] simulated data using empirical distribution

Daniel Lakeland dlakelan at street-artists.org
Thu Oct 11 19:44:49 CEST 2007


On Thu, Oct 11, 2007 at 07:30:16AM -0400, tom sgouros wrote:
> 
> Hello all:
> 
> Many thanks to the people who have responded to my question, on and
> off-list.  My problem isn't completely solved, though, and perhaps you
> can help again.
> 
> The problem, again, is that I have what is essentially a histogram, but
> not the underlying data, and I want to simulate data that would have
> created that histogram. 
...
>The way the bins are chosen, I would expect
> that 9 out of 12 bins have a down-ward slope, meaning that approximating
> them with a square top gives me more along the high border of the bin,
> and I currently suspect that this is at least part of the bias.

Did you try smoothing your data using a normal smoothing parameter,
maybe 5000 dollars standard deviation? This will automatically, by
diffusion, eliminate some of this effect.

if "mysample" is your simulated sample using the uniform random
numbers, then

mysample + rnorm(NROW(mysample),0,5000)

will be your smoothed sample. 

Another thing that's possibly going on is that the exponential tail is
too right-skewed for your data. You could try generating an
exponential with a faster decay rate for this final bin.

> Is there a way to ask for a not-quite uniform distribution of random
> data?  I imagine a density function with a linear, but not flat, top.

This is possible, but not "out of the box". You could do some
mathematical tweaking to your data, or use "rejection"
sampling. 

Perhaps an easier way for you to do the whole thing is to fit a gamma
distribution. Gamma is a very general family of distributions,
especially if you include a location parameter. 

You can simply plot your histogram, and then overplot a gamma density
using dgamma. Play with the shape and scale parameters until you get a
"by eye" fit that's good enough for your purposes.

You can get a good starting point by maximum likelihood fitting to a
sample generated by the uniform-within-bin method you've already
tried. Use "fitdistr" from the MASS package.

library(MASS)
fitdistr(mysample,"gamma") # parameters for your starting point



-- 
Daniel Lakeland
dlakelan at street-artists.org
http://www.street-artists.org/~dlakelan



More information about the R-help mailing list