[R] Fitting a distribution to peaks in histogram

Berton Gunter gunter.berton at gene.com
Wed Jul 19 19:10:02 CEST 2006


With this much data, I think it makes more sense to fit a nonparametric
density estimate. ?density does this via a kernel density procedure, but
RSiteSearch('nonparametric density') will find many alternatives. The ash
and mclust packages are two that come to mind, but there are certainly
others.

Of course, if you must have a parametric fit, then you'll have to fit a
mixture of some sort.  But when both the number of components and individual
distributions are to be estimated, this is a nontrivial problem, as one runs
into identifiability issues and corresponding convergence problems. V&R's
discussion of density estimation in MASS has some useful things to say about
these issues, and Ripley's book, "PATTERN RECOGNITION AND NEURAL NETWORKS"
has even more. As both sources indicate, there's a large literature on this
issue and much software.

Cheers,
Bert Gunter
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of hadley wickham
> Sent: Wednesday, July 19, 2006 9:21 AM
> To: Ulrik Stervbo
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Fitting a distribution to peaks in histogram
> 
> > I would like to fit a distribution to each of the peaks in 
> a histogram, such
> > as this: 
> http://photos1.blogger.com/blogger/7029/2724/1600/DU145-Bax3-B
> cl-xL.png
> 
> As a first shot, I'd try fitting a mixture of gamma distributions (say
> 3), plus a constant term for the highest bin.  You could do this using
> ML.  If the number of peaks is truly unknown, this will be a little
> trickier but still possible and you could use the LRT to chose between
> them.
> 
> > Integrate the area between each two peaks, using the means 
> and widths of the
> > distributions fitted to the two peaks. I will be using the integrate
> > function
> 
> Why do you want to do this?
> 
> >
> > The histogram is based on approximately 15000 events, which 
> makes Mclust and
> > pam (which both delivers the information I need) less useful.
> 
> If you have unbinned data, it would be better (more precise/powerful)
> to use that.
> 
> Regards,
> 
> Hadley
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list