[R] Fitting a distribution to peaks in histogram

Ulrik Stervbo ulriks at ruc.dk
Thu Jul 20 11:08:27 CEST 2006


On 7/19/06, hadley wickham <h.wickham at gmail.com> wrote:
> > Can you be a bit more excact? I a biologist and relatively new to R
>
> In that case, I would _strongly_ advise that you get advice from a
> local statistician.

I am afraid that, by comparison, I am the local statistican. I am also
the local R-guru, and neither is saying much  - so please bear with
me.

Do you know of some functions (built in hopefully) that I can try?

I did try the density estimate from the Mclust package, but got an out
of memory error. I did look at the Ash package, but I am afraid I
failed to see how I can use it.

At the moment, I am estimating the density, using the stats density(),
identify the peaks in the density estimate by Petr's function, and can
thus extract a very good suggestion for a mean and intensity for each
peak - surely that must be useful for something? Based on the
literature I also have a very good suggestion for at upper and lower
width of the distribution.

>
> > I am measureing the amount of DNA in cells, and I need to know the
> > percentage of cells in a part of the cell cycle; that the percentage
> > of cells in the first peak, in the second peak and so on. I want to
> > integrate the area between to two cells, because that apparently is
> > how its none (as far as I can tell from the literature)
>
> That doesn't sound quite right to me, because you also need to take
> into account the fact that some cells between peak 1 and 2 belong to
> peak 1, and some to peak 2.  This is something that will come out
> immediately from a mixture based approach. If you know that peaks
> correspond to certain parts of the cell cycle, then this is important
> information that should be included in the analysis.

I realise that some cells between to peaks belong to the peaks, but
thought that this was a general problem, usually sacrificed for speed.
One of the most widely used programs for analysing cell cycle use a
variant of my strategy as far as I can tell; fitting Gaussian
distributions to the two peaks and integrate the part between. The
reason why I am not using this program is that I cannot afford it, and
it does a very poor job when analysing cells with abnormal amounts of
DNA.

Ulrik

-- 
Blog: http://ulrikstervbo.blogspot.com



More information about the R-help mailing list