[R] Nonparametric bivariate distribution estimation and sampling

heyi xiao xiaoheyiyh at yahoo.com
Fri Mar 23 20:55:36 CET 2012


David,
Thanks a lot for the specific suggestions. That’s very helpful. My question 1 is fully answered now. I guess I am not clear enough for my question 2. I would like to generate a random sample using the estimated probability density (as a result of my question 1) as the reference distribution. Say, I get a matrix of the estimated density (at some grid points) using MASS::kde2d. How can I use that result as a reference distribution to sample data from? I know it is a trivial issue for parametric distributions like bivariate normal, but what about such a nonparametric bivariate reference distribution? Any particular procedures or functions I can use?
The reason I don’t want to use sampling (with replacement, I can sample more data than I have without replacement), as this will generate lots of duplicate data points, if I want to generated bigger dataset yet my raw data do not have a big sample size. The scatter plot of the sampled data doesn’t look good this way.
Heyi


--- On Fri, 3/23/12, David Winsemius <dwinsemius at comcast.net> wrote:

> From: David Winsemius <dwinsemius at comcast.net>
> Subject: Re: [R] Nonparametric bivariate distribution estimation and sampling
> To: "heyi xiao" <xiaoheyiyh at yahoo.com>
> Cc: "Sarah Goslee" <sarah.goslee at gmail.com>, r-help at r-project.org
> Date: Friday, March 23, 2012, 2:20 PM
> 
> On Mar 23, 2012, at 1:53 PM, heyi xiao wrote:
> 
> > Sarah,
> > Thanks for the response. I actually have several years
> of working experience with R and statistics, although may
> not be as good as you. that’s why I am here ;) I dug
> deeper into R documentations and previous R-help posts, and
> couldn’t found anything particular help. Again, I want to
> do two things: (1) estimate the probability density of this
> bivariate distribution using some nonparametric method
> (kernel, spline etc);
> 
> ?MASS::kde2d
> ?KernSmooth::bkde2D
> ?ade4::s.kde2d
> help(package=locfit)
> 
> > (2) sample a big dataset from this bivariate
> distribution for a simulation study.
> 
> What is wrong with `sample`?
> 
> # to get sample of size n without replacement
> set.seed(42)
> dfrm[ sample(1:NROW(dfrm), n) , ]
> 
> --David.
> > If my questions are not clear enough show my how I can
> improve, or which part is not clear enough. If you have any
> particular suggestions/comments, you are more than welcome.
> Thanks!
> > Heyi
> > 
> > 
> > --- On Fri, 3/23/12, Sarah Goslee <sarah.goslee at gmail.com>
> wrote:
> > 
> >> From: Sarah Goslee <sarah.goslee at gmail.com>
> >> Subject: Re: [R] Nonparametric bivariate
> distribution estimation and sampling
> >> To: "heyi xiao" <xiaoheyiyh at yahoo.com>
> >> Cc: r-help at r-project.org
> >> Date: Friday, March 23, 2012, 12:26 PM
> >> R can do all of that and more.
> >> 
> >> But you'll need to put some work in reading about
> how to use
> >> R, about
> >> the statistical methods involved, and about how to
> use them
> >> to best
> >> effect. You might want, for instance, generalized
> additive
> >> models. Or
> >> not. If your question isn't more fully-formed than
> this,
> >> your best bet
> >> is almost certainly to talk to a local
> statistician, spend
> >> some time
> >> working with R, and then come back to the list
> with
> >> specific
> >> questions.
> >> 
> >> Sarah
> >> 
> >> On Fri, Mar 23, 2012 at 12:17 PM, heyi xiao <xiaoheyiyh at yahoo.com>
> >> wrote:
> >>> Dear all,
> >>> I have a bivariate dataset from a preliminary
> study. I
> >> want to do two things: (1) estimate the probability
> density
> >> of this bivariate distribution using some
> nonparametric
> >> method (kernel, spline etc); (2) sample a big
> dataset from
> >> this bivariate distribution for a simulation
> study.
> >>> Is there any good method or package I can use
> in R for
> >> my work? I don’t want parametric models like
> bivariate
> >> normal distribution etc, as I would like to
> accurate model
> >> my data. I don’t want to use the bootstrapping
> approach,
> >> i.e. sampling with replacement, as this will
> generate lots
> >> of duplicate data points. Any thoughts or input
> will be
> >> highly appreciated!
> >>> Heyi
> >>> 
> >>> 
> >> 
> >> --Sarah Goslee
> >> http://www.functionaldiversity.org
> >> 
> > 
> > ______________________________________________
> > R-help at r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT
> 
>



More information about the R-help mailing list