[R] mixture univariate distributions fit

Thu Dec 30 17:09:42 CET 2021

Petr:

1. I now am somewhat confused by your goals. Any curve fitting
procedure can fit curves to data, including discrete points on a
smooth curve. What cannot be done is to recover the exact parameters
of the smooth curve from which your data derive, at least not without
knowing how the curve was fitted to that underlying (particle size)
data, i.e. the nature of the parameterization (if even there was one
-- it may just be some sort of empirical smoother like kernel density
estimation or some such).

2. mixtools uses EM, so the similarity you noted is probably not surprising.

3. Given my evident confusion, I hope that this email may prompt
someone more knowledgeable than I to correct or clarify my "advice."
You may wish to further clarify what you wish to do with the
parameters that you hope to get to encourage such comments. For
example, do you wish to compare the densities you get via their
parameterizations? If so, others may be able to offer better
strategies to do this, perhaps on SO --
https://stats.stackexchange.com/  -- rather than here.

Cheers,
Bert Gunter

On Thu, Dec 30, 2021 at 12:36 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>
> Thank you Bert
>
> The values are results from particle size measurement by sedimentation and
> they are really available only as these cumulative or density distributions.
> What I thought about was that there is some package which could fit data of
> such curves and deliver parameters of fitted curves.
>
> Something like
> https://chrisostrouchov.com/post/peak_fit_xrd_python/
>
> I found package EMpeaksR which results close to values estimated from mixtools
> package.
>
> test <- spect_em_gmm(temp1$velik, temp1$proc, mu=c(170, 220),
> mix_ratio=c(1,1), sigma=c(5,5), maxit=2000, conv.cri=1e-8)
> print(cbind(test$mu, test$sigma, test$mix_ratio))
>          [,1]      [,2]      [,3]
> [1,] 170.7744  7.200109 0.5759867
> [2,] 229.1815 10.831626 0.4240133
>
> But it is probably in stage of intensive development as it is limited in data
> visualisation
>
> Any further hint is appreciated.
>
> Regards
> Petr
>
> > -----Original Message-----
> > From: Bert Gunter <bgunter.4567 using gmail.com>
> > Sent: Wednesday, December 29, 2021 5:01 PM
> > To: PIKAL Petr <petr.pikal using precheza.cz>
> > Cc: r-help mailing list <r-help using r-project.org>
> > Subject: Re: [R] mixture univariate distributions fit
> >
> > No.
> >
> > However, if the object returned is the "Value" structure of whatever density
> > function you use, it probably contains the original data. You need to check
> > the docs to see. But this does not appear to be your situation.
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Wed, Dec 29, 2021 at 3:05 AM PIKAL Petr <petr.pikal using precheza.cz>
> > wrote:
> > >
> > > Dear all
> > >
> > > I have data which are either density distribution estimate or
> > > cummulative density distribution estimate (temp1, temp2 below). I
> > > would like to get values (mu, sd) for underlaying original data but they
> > > are
> > not available.
> > >
> > > I found mixtools package which calculate what I need but it requires
> > > original data (AFAIK). They could be generated from e.g. temp1 by
> > >
> > > set.seed(111)
> > > x<- sample(temp1$velik, size=100000, replace=TRUE, prob=temp1$proc)
> > >
> > > library(mixtools)
> > > fit <- normalmixEM(x)
> > > plot(fit, which=2)
> > > summary(fit)
> > > summary of normalmixEM object:
> > >            comp 1     comp 2
> > > lambda   0.576346   0.423654
> > > mu     170.784520 229.192823
> > > sigma    7.203491  10.793461
> > > loglik at estimate:  -424062.7
> > > >
> > >
> > > Is there any way how to get such values directly from density or
> > > cummulative density estimation without generating fake data by sample?
> > >
> > > Best regards
> > > Petr
> > >
> > >
> > > temp1 <- structure(list(velik = c(155, 156.8, 157.9, 158.8, 159.6,
> > > 160.4, 161.2, 161.9, 162.5, 163.1, 163.8, 164.3, 164.7, 165.3, 165.8,
> > > 166.2, 166.7, 167.2, 167.7, 168.2, 168.7, 169.1, 169.6, 170.1, 170.6,
> > > 171.1, 171.6, 172, 172.5, 173, 173.5, 174, 174.5, 175.1, 175.7, 176.3,
> > > 177, 177.6, 178.3, 179.1, 179.9, 180.6, 181.4, 182.4, 183.5, 184.7,
> > > 186.1, 187.9, 189.8, 192, 194.4, 197, 200.1, 203.5, 206.7, 209.2,
> > > 211.3, 213.1, 214.8, 216.3, 217.4, 218.5, 219.5, 220.4, 221.3, 222.1,
> > > 223, 223.7, 224.5, 225.2, 225.9, 226.7, 227.5, 228.2, 228.9, 229.6,
> > > 230.4, 231.2, 231.9, 232.6, 233.4, 234.2, 235, 235.9, 236.8, 237.7,
> > > 238.6, 239.7, 241, 242.3, 243.6, 245.2, 247.1, 249.3, 251.9, 255.3,
> > > 260, 266, 274.9, 323.4 ), proc = c(0.6171, 1.583, 1.371, 2.13, 1.828,
> > > 2.095, 1.994, 2.694, 2.824, 2.41, 2.909, 3.768, 3.179, 3.029, 3.798,
> > > 3.743, 3.276, 3.213, 3.579, 2.928, 4.634, 3.415, 3.473, 3.135, 3.476,
> > > 3.759, 3.726, 3.9, 3.593, 2.89, 3.707, 4.08, 2.846, 2.685, 3.394,
> > > 2.737, 2.693, 2.878, 2.248, 2.368, 2.258, 2.662, 1.866, 1.895, 1.457,
> > > 1.513, 1.181, 1.008, 0.9641, 0.799, 0.7878, 0.7209, 0.5869, 0.5778,
> > > 0.7313, 0.9531, 1.053, 1.317, 1.247, 1.739, 2.064, 1.99, 2.522, 2.401,
> > > 2.48, 2.687, 2.797, 2.918, 3.243, 3.055, 3.009, 2.89, 3.037, 3.25,
> > > 3.349, 3.141, 2.771, 2.985, 3.203, 3.298, 3.215, 2.637, 2.683, 2.782,
> > > 2.632, 2.625, 2.475, 2.014, 1.781, 1.987, 1.627, 1.374, 1.352, 0.9441,
> > > 1.01, 0.5737, 0.5265, 0.3794, 0.2513, 0.0351)), row.names = 2:101,
> > > class = "data.frame")
> > >
> > > temp2 <- structure(list(velik = c(153.8, 156.3, 157.3, 158.4, 159.2,
> > > 160.1, 160.8, 161.6, 162.2, 162.8, 163.5, 164, 164.5, 165, 165.5, 166,
> > > 166.4, 166.9, 167.5, 167.9, 168.5, 168.9, 169.4, 169.8, 170.4, 170.9,
> > > 171.3, 171.8, 172.2, 172.7, 173.3, 173.8, 174.2, 174.8, 175.5, 176,
> > > 176.6, 177.3, 177.9, 178.7, 179.5, 180.3, 180.9, 181.9, 182.9, 184.1,
> > > 185.3, 186.9, 188.8, 190.8, 193.2, 195.6, 198.4, 201.8, 205.3, 208.1,
> > > 210.3, 212.3, 213.9, 215.7, 216.9, 218, 219.1, 219.9, 220.8, 221.7,
> > > 222.6, 223.4, 224.1, 224.8, 225.6, 226.3, 227.1, 227.8, 228.5, 229.2,
> > > 230, 230.8, 231.6, 232.3, 233, 233.7, 234.6, 235.5, 236.3, 237.2,
> > > 238.1, 239.1, 240.3, 241.6, 242.9, 244.4, 246.1, 248, 250.6, 253.1,
> > > 257.6, 262.5, 269.5, 280.4, 372.9), proc = c(0, 1, 2, 3, 4, 5, 6, 7,
> > > 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
> > > 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
> > > 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
> > > 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
> > > 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
> > > 94, 95, 96, 97, 98, 99, 100)), row.names = c(NA, 101L), class =
> > > "data.frame")
> > > >
> > >
> > > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.