[R] separate mixture of gamma and normal (with mixtools or ??)

Thomas Petzoldt thpe at simecol.de
Tue Jan 24 08:47:16 CET 2017


Dear Bert,

thank you very much for your suggestion. You are right, ill-conditioning 
was sometimes a problem for 3 components, but not in the two-component 
case. The modes are well separated, and the sample size is high.

My main problem is (1) the shape of the distributions and (2) the 
diversity of available packages and approaches to this topic.

In the mean time I made some progress in this matter by treating the 
data as a mixture of gamma distributions (package mixdist, see below), 
so what I want to know is the purely R technical question ;-)

Has someone else has ever stumbled across something like this and can 
make a suggestion which package to use?

Thanks, Thomas


## Approximate an Exponential+Gaussian mixture with
##   a mixture of Gammas

library("mixdist")
set.seed(123)
lambda <- c(0.25, 0.75)
N <- 2000
x <- c(rexp(lambda[1]*N, 1), rnorm(lambda[2]*N, 20, 4))

xx <- mixgroup(x, breaks=0:40)
pp <- mixparam(mu=c(1, 8), sigma=c(1, 3), pi=c(0.2, 0.5))
mix <-  mix(xx, pp, dist="gamma", emsteps=10)

summary(mix)
p <- coef(mix)
beta <- with(p, sigma^2/mu)
alpha <- with(p, mu /beta)
lambda <- p$pi

plot(mix, xlim=c(0, 35))
x1 <- seq(0, 35, 0.1)
lines(x1, lambda[1]*dgamma(x1, alpha[1], 1/beta[1]),
   col="orange", lwd=2)
lines(x1, lambda[2]*dgamma(x1, alpha[2], 1/beta[2]),
   col="magenta", lwd=2)



Am 24.01.2017 um 00:34 schrieb Bert Gunter:
> Fitting multicomponent mixtures distributions -- and 3 is already a
> lot of components -- is inherently ill-conditioned. You may need to
> reassess your strategy. You might wish to post on stackexchange
> instead to discuss such statistical issues.
>
> Cheers,
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Jan 23, 2017 at 2:32 PM, Thomas Petzoldt <thpe at simecol.de> wrote:
>> Dear friends,
>>
>> I am trying to separate bi- (and sometimes tri-) modal univariate mixtures
>> of biological data, where the first component is left bounded (e.g.
>> exponential or gamma) and the other(s) approximately Gaussian.
>>
>> After checking several packages, I'm not really clear what to do. Here is an
>> example with "mixtools" that already works quite good, however, the left
>> component is not Gaussian (and not symmetric).
>>
>> Any idea about a more adequate function or package for this problem?
>>
>> Thanks a lot!
>>
>> Thomas
>>
>>
>>
>> library(mixtools)
>> set.seed(123)
>>
>> lambda <- c(0.25, 0.75)
>> N      <- 200
>>
>> ## dist1 ~ gamma (or exponential as a special case)
>> #dist1 <- rexp(lambda[1]*N, 1)
>> dist1 <- rgamma(lambda[1]*N, 1, 1)
>>
>> ## dist2 ~ normal
>> dist2 <- rnorm(lambda[2]*N, 12, 2)
>>
>> ## mixture
>> x <- c(dist1, dist2)
>>
>> mix <-  spEMsymloc(x, mu0=2, eps=1e-3, verbose=TRUE)
>> plot(mix, xlim=c(0, 25))
>> summary(mix)
>>
>>
>> --
>> Thomas Petzoldt
>> TU Dresden, Institute of Hydrobiology
>> http://www.tu-dresden.de/Members/thomas.petzoldt
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list