[R] how to generate a random data from a empirical distribition

Greg Snow Greg.Snow at imail.org
Wed Jul 28 01:36:31 CEST 2010


If they want to generate directly from the empirical distribution, then sampling with replacement is the best choice (others had already suggested that).  But the reference in the original post to the normal and beta distributions suggested to me that the original poster may have wanted a smooth approximation to the empirical distribution rather than the step function (but not locked to a specific distribution).  The logspline package has functions for doing things like this.  It has the advantage that it can give a smooth (non-step) plot of the cdf (estimated) as well as generate points that are based on the observed data, but could generate values outside the original range of the data and have fewer ties.  

Whether these "advantages" make any difference depends on what they want to do with the observations (for many applications the difference is probably negligible and using sample is the simplest/best).  But there may be some uses for which these "advantages" are beneficial.  (using sample then adding a small random "error" to each value is another option, but I like the logspline option better).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: Frank Harrell [mailto:f.harrell at vanderbilt.edu]
> Sent: Tuesday, July 27, 2010 4:54 PM
> To: Greg Snow
> Cc: xin wei; r-help at r-project.org
> Subject: Re: [R] how to generate a random data from a empirical
> distribition
> 
> Easiest thing is to sample with replacement from the original data.
> This is the idea behind the bootstrap, which is sampling from the
> empirical CDF.
> 
> Frank E Harrell Jr   Professor and Chairman        School of Medicine
>                       Department of Biostatistics   Vanderbilt
> University
> 
> On Tue, 27 Jul 2010, Greg Snow wrote:
> 
> > Another option for fitting a smooth distribution to data (and
> generating future observations from the smooth distribution) is to use
> the logspline package.
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of xin wei
> >> Sent: Monday, July 26, 2010 12:36 PM
> >> To: r-help at r-project.org
> >> Subject: [R] how to generate a random data from a empirical
> >> distribition
> >>
> >>
> >> hi, this is more a statistical question than a R question. but I do
> >> want to
> >> know how to implement this in R.
> >> I have 10,000 data points. Is there any way to generate a empirical
> >> probablity distribution from it (the problem is that I do not know
> what
> >> exactly this distribution follows, normal, beta?). My ultimate goal
> is
> >> to
> >> generate addition 20,000 data point from this empirical distribution
> >> created
> >> from the existing 10,000 data points.
> >> thank you all in advance.
> >>
> >>
> >> --
> >> View this message in context: http://r.789695.n4.nabble.com/how-to-
> >> generate-a-random-data-from-a-empirical-distribition-
> >> tp2302716p2302716.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >



More information about the R-help mailing list