[R] simulated data using empirical distribution

tom sgouros tomfool at as220.org
Thu Oct 11 13:30:16 CEST 2007


Hello all:

Many thanks to the people who have responded to my question, on and
off-list.  My problem isn't completely solved, though, and perhaps you
can help again.

The problem, again, is that I have what is essentially a histogram, but
not the underlying data, and I want to simulate data that would have
created that histogram.  That is, I have counts for the number of data
points in a dozen bins.  The bins are not of uniform size.  (It's income
data, reported as incomes from 0-10k, 10k-25k, 25k-50k, and so on.)

The best suggestion I had yesterday was to simulate the data with
uniform distributions in each bin, and an exponential one on the
rightmost bin, and I did that and superficially it looks good.
Unfortunately, now that I am trying to calibrate the model, I have
discovered a high bias.  The way the bins are chosen, I would expect
that 9 out of 12 bins have a down-ward slope, meaning that approximating
them with a square top gives me more along the high border of the bin,
and I currently suspect that this is at least part of the bias.

Is there a way to ask for a not-quite uniform distribution of random
data?  I imagine a density function with a linear, but not flat, top.  I
admit that the standard selection of distributions in R is more than I
am familiar with, but I can't find one that does what I think I need.

Any advice (R advice or statistics advice) is welcome.  Thanks again,

 -tom

-- 
 ------------------------
 tomfool at as220 dot org
 http://sgouros.com  
 http://whatcheer.net



More information about the R-help mailing list