[R] Distribution plus background fitting

David Winsemius dwinsemius at comcast.net
Mon Mar 11 16:58:20 CET 2013


On Mar 11, 2013, at 3:40 AM, Richard Longland wrote:

> Hi All,
> 
> I apologise if this question has been answered before, but my background is
> a little different from most people using R, and the language we use seems
> to be different! I am trying to analyse some nuclear physics data, which
> consists of an ensemble of "energy" readings in a detector that, when
> binned, form a number of Gaussian shaped peaks superimposed on a varying
> background (see attached pdf image).
> 
> In my field, the common practice is to bin these readings into a histogram,
> and then use a variety of chi-squared fitting techniques to fit a linear
> background and Gaussian directly to the binned counts to obtain a position
> and area under a peak. I understand that this is poor practice, especially
> given my rather low counting rate, so I have been attempting to fit the raw
> data with the fitdistrplus package (linked to from
> http://www.r-bloggers.com/fitting-distributions-with-r).
> 
> I can fit an isolated Gaussian peak successfully using these methods (the
> central peak in my attached image). However, I cannot find a suitable
> distribution to account for the background in the peaks to the left of this
> (which I assume to vary linearly under each peak for now). Is this possible
> with usual distribution fitting methods, or should I stick to fitting the
> binned data (possibly including some interval censoring corrections)?

When you say "each peak" it suggests either that you know where peaks should be on the basis of theory or your eyeball-brain interaction is producing a filtered assessment of where your wetware peak detectors are registering a probable peak. Yu are implicitly creating some sort of distribution for the major peaks and then looking within the shoulders of those peaks for secondary peaks. (My detectors suggest perhaps as many 8 peaks but human beings are notorious for finding patterns in noise.) 

You have a mixture distribution and one possibility is that these are mixtures of normal distributions (aka Gaussians) with different mixing proportions. I do not think the fitdistrplus functions are designed for that task. When I go looking for packages and functions to do a task I use the sos package and the `findFn` function therein:

install.packages("sos")
library(sos)
findFn("mixture of normals")

The function I see in that list of 93 candidates that seems most likely to address my understanding of the problem is:

http://finzi.psych.upenn.edu/R/library/mixtools/html/normalmixEM.html

But it also looks as though there are other candidates. I would not think that binning of the data should be needed. Leaving it in its original state should allow the proper estimation of the data features by algorithms, unless requested by package authors in their vignettes or help pages. 

-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list