[R] mixture univariate distributions fit

Fri Dec 31 08:59:11 CET 2021

Hallo Bert

Sorry for the confusion, I may not have used correct wording in describing 
what I wanted to do. Consider this example

# 2 different distribution densities mixed together in 1:2 proportion
x <- (0:100)/100
y1 <- dnorm((x, mean=.3, sd=.1)
y2 <- dnorm((x, mean=.7, sd=.1)
ymix <- ((y1+2*y2)/max(y1+2*y2))
plot((x, ymix)

# sampling  x based on mixed distribution density
dist <- sample((x, size=10000, replace=TRUE, prob=ymix)
library(mixtools)
fit <- normalmixEM(dist)
summary(fit)

summary of normalmixEM object:
         comp 1    comp 2
lambda 0.334926 0.6650742
mu     0.304862 0.7036926
sigma  0.098673 0.0994382
loglik at estimate:  30935.34

I can get estimation of mu and sigma together with proportion of each (lambda) 
distribution in mixture. My question is if there is some package or function 
which could get those values ***directly from x and ymix values***, which is 
basically what is measured in my case.

I know I could code some optimisation procedure and adopt what was done in 
Python https://chrisostrouchov.com/post/peak_fit_xrd_python/ for peak fitting 
or this one 
https://stats.stackexchange.com/questions/92748/multi-peak-gaussian-fit-in-r 
but I do not like to reinvent the wheel.

I went through Chemometrics and Computational Physics task view, which I 
consider most appropriate to have such functions but it is usually quite 
specific, especially in required input data form.

Anyway, thanks for your effort.

I wish you best for new year 2022.
Petr

> -----Original Message-----
> From: Bert Gunter <bgunter.4567 using gmail.com>
> Sent: Thursday, December 30, 2021 5:10 PM
> To: PIKAL Petr <petr.pikal using precheza.cz>
> Cc: r-help mailing list <r-help using r-project.org>
> Subject: Re: [R] mixture univariate distributions fit
>
> Petr:
>
> 1. I now am somewhat confused by your goals. Any curve fitting procedure
> can fit curves to data, including discrete points on a smooth curve. What
> cannot be done is to recover the exact parameters of the smooth curve from
> which your data derive, at least not without knowing how the curve was
> fitted to that underlying (particle size) data, i.e. the nature of the
> parameterization (if even there was one
> -- it may just be some sort of empirical smoother like kernel density
> estimation or some such).
>
> 2. mixtools uses EM, so the similarity you noted is probably not surprising.
>
> 3. Given my evident confusion, I hope that this email may prompt someone
> more knowledgeable than I to correct or clarify my "advice."
> You may wish to further clarify what you wish to do with the parameters that
> you hope to get to encourage such comments. For example, do you wish to
> compare the densities you get via their parameterizations? If so, others may
> be able to offer better strategies to do this, perhaps on SO --
> https://stats.stackexchange.com/  -- rather than here.
>
> Cheers,
> Bert Gunter
>
> On Thu, Dec 30, 2021 at 12:36 AM PIKAL Petr <petr.pikal using precheza.cz>
> wrote:
> >
> > Thank you Bert
> >
> > The values are results from particle size measurement by sedimentation
> > and they are really available only as these cumulative or density
> distributions.
> > What I thought about was that there is some package which could fit
> > data of such curves and deliver parameters of fitted curves.
> >
> > Something like
> > https://chrisostrouchov.com/post/peak_fit_xrd_python/
> >
> > I found package EMpeaksR which results close to values estimated from
> > mixtools package.
> >
> > test <- spect_em_gmm(temp1$velik, temp1$proc, mu=c(170, 220),
> > mix_ratio=c(1,1), sigma=c(5,5), maxit=2000, conv.cri=1e-8)
> > print(cbind(test$mu, test$sigma, test$mix_ratio))
> >          [,1]      [,2]      [,3]
> > [1,] 170.7744  7.200109 0.5759867
> > [2,] 229.1815 10.831626 0.4240133
> >
> > But it is probably in stage of intensive development as it is limited
> > in data visualisation
> >
> > Any further hint is appreciated.
> >
> > Regards
> > Petr
> >
> > > -----Original Message-----
> > > From: Bert Gunter <bgunter.4567 using gmail.com>
> > > Sent: Wednesday, December 29, 2021 5:01 PM
> > > To: PIKAL Petr <petr.pikal using precheza.cz>
> > > Cc: r-help mailing list <r-help using r-project.org>
> > > Subject: Re: [R] mixture univariate distributions fit
> > >
> > > No.
> > >
> > > However, if the object returned is the "Value" structure of whatever
> > > density function you use, it probably contains the original data.
> > > You need to check the docs to see. But this does not appear to be your
> situation.
> > >
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep coming
> > > along and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >
> > > On Wed, Dec 29, 2021 at 3:05 AM PIKAL Petr <petr.pikal using precheza.cz>
> > > wrote:
> > > >
> > > > Dear all
> > > >
> > > > I have data which are either density distribution estimate or
> > > > cummulative density distribution estimate (temp1, temp2 below). I
> > > > would like to get values (mu, sd) for underlaying original data
> > > > but they are
> > > not available.
> > > >
> > > > I found mixtools package which calculate what I need but it
> > > > requires original data (AFAIK). They could be generated from e.g.
> > > > temp1 by
> > > >
> > > > set.seed(111)
> > > > x<- sample(temp1$velik, size=100000, replace=TRUE,
> > > > prob=temp1$proc)
> > > >
> > > > library(mixtools)
> > > > fit <- normalmixEM(x)
> > > > plot(fit, which=2)
> > > > summary(fit)
> > > > summary of normalmixEM object:
> > > >            comp 1     comp 2
> > > > lambda   0.576346   0.423654
> > > > mu     170.784520 229.192823
> > > > sigma    7.203491  10.793461
> > > > loglik at estimate:  -424062.7
> > > > >
> > > >
> > > > Is there any way how to get such values directly from density or
> > > > cummulative density estimation without generating fake data by
> sample?
> > > >
> > > > Best regards
> > > > Petr
> > > >
> > > >
> > > > temp1 <- structure(list(velik = c(155, 156.8, 157.9, 158.8, 159.6,
> > > > 160.4, 161.2, 161.9, 162.5, 163.1, 163.8, 164.3, 164.7, 165.3,
> > > > 165.8, 166.2, 166.7, 167.2, 167.7, 168.2, 168.7, 169.1, 169.6,
> > > > 170.1, 170.6, 171.1, 171.6, 172, 172.5, 173, 173.5, 174, 174.5,
> > > > 175.1, 175.7, 176.3, 177, 177.6, 178.3, 179.1, 179.9, 180.6,
> > > > 181.4, 182.4, 183.5, 184.7, 186.1, 187.9, 189.8, 192, 194.4, 197,
> > > > 200.1, 203.5, 206.7, 209.2, 211.3, 213.1, 214.8, 216.3, 217.4,
> > > > 218.5, 219.5, 220.4, 221.3, 222.1, 223, 223.7, 224.5, 225.2,
> > > > 225.9, 226.7, 227.5, 228.2, 228.9, 229.6, 230.4, 231.2, 231.9,
> > > > 232.6, 233.4, 234.2, 235, 235.9, 236.8, 237.7, 238.6, 239.7, 241,
> > > > 242.3, 243.6, 245.2, 247.1, 249.3, 251.9, 255.3, 260, 266, 274.9,
> > > > 323.4 ), proc = c(0.6171, 1.583, 1.371, 2.13, 1.828, 2.095, 1.994,
> > > > 2.694, 2.824, 2.41, 2.909, 3.768, 3.179, 3.029, 3.798, 3.743,
> > > > 3.276, 3.213, 3.579, 2.928, 4.634, 3.415, 3.473, 3.135, 3.476,
> > > > 3.759, 3.726, 3.9, 3.593, 2.89, 3.707, 4.08, 2.846, 2.685, 3.394,
> > > > 2.737, 2.693, 2.878, 2.248, 2.368, 2.258, 2.662, 1.866, 1.895,
> > > > 1.457, 1.513, 1.181, 1.008, 0.9641, 0.799, 0.7878, 0.7209, 0.5869,
> > > > 0.5778, 0.7313, 0.9531, 1.053, 1.317, 1.247, 1.739, 2.064, 1.99,
> > > > 2.522, 2.401, 2.48, 2.687, 2.797, 2.918, 3.243, 3.055, 3.009,
> > > > 2.89, 3.037, 3.25, 3.349, 3.141, 2.771, 2.985, 3.203, 3.298,
> > > > 3.215, 2.637, 2.683, 2.782, 2.632, 2.625, 2.475, 2.014, 1.781,
> > > > 1.987, 1.627, 1.374, 1.352, 0.9441, 1.01, 0.5737, 0.5265, 0.3794,
> > > > 0.2513, 0.0351)), row.names = 2:101, class = "data.frame")
> > > >
> > > > temp2 <- structure(list(velik = c(153.8, 156.3, 157.3, 158.4,
> > > > 159.2, 160.1, 160.8, 161.6, 162.2, 162.8, 163.5, 164, 164.5, 165,
> > > > 165.5, 166, 166.4, 166.9, 167.5, 167.9, 168.5, 168.9, 169.4,
> > > > 169.8, 170.4, 170.9, 171.3, 171.8, 172.2, 172.7, 173.3, 173.8,
> > > > 174.2, 174.8, 175.5, 176, 176.6, 177.3, 177.9, 178.7, 179.5,
> > > > 180.3, 180.9, 181.9, 182.9, 184.1, 185.3, 186.9, 188.8, 190.8,
> > > > 193.2, 195.6, 198.4, 201.8, 205.3, 208.1, 210.3, 212.3, 213.9,
> > > > 215.7, 216.9, 218, 219.1, 219.9, 220.8, 221.7, 222.6, 223.4,
> > > > 224.1, 224.8, 225.6, 226.3, 227.1, 227.8, 228.5, 229.2, 230,
> > > > 230.8, 231.6, 232.3, 233, 233.7, 234.6, 235.5, 236.3, 237.2,
> > > > 238.1, 239.1, 240.3, 241.6, 242.9, 244.4, 246.1, 248, 250.6,
> > > > 253.1, 257.6, 262.5, 269.5, 280.4, 372.9), proc = c(0, 1, 2, 3, 4,
> > > > 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
> > > > 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
> > > > 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
> > > > 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
> > > > 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
> > > > 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)),
> > > > row.names = c(NA, 101L), class =
> > > > "data.frame")
> > > > >
> > > >
> > > > >
> > > > ______________________________________________
> > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.