[R] Fitting Mixture distributions

Thu Sep 8 14:38:10 CEST 2016

>>>>> Bert Gunter <bgunter.4567 at gmail.com>
>>>>>     on Wed, 7 Sep 2016 23:47:40 -0700 writes:

    > "please suggest what can I do to resolve this
    > issue."

    > Fitting normal mixtures can be difficult, and sometime the
    > optimization algorithm (EM) will get stuck with very slow convergence.
    > Presumably there are options in the package to either increase the max
    > number of steps before giving up or make the convergence criteria less
    > sensitive. The former will increase the run time and the latter will
    > reduce the optimality (possibly leaving you farther from the true
    > optimum). So you should look into changing these as you think
    > appropriate.

I'm jumping in late, without having read everything preceding.

One of the last messages seemed to indicate that you are looking
at mixtures of *one*-dimensional gaussians.

If this is the case, I strongly recommend looking at (my) CRAN
package 'nor1mix' (the "1" is for "*one*-dimensional).

For a while now that small package is providing an alternative
to the EM, namely direct MLE, simply using optim(<likelihood>) where the
likelihood uses a somewhat smart parametrization.

Of course, *as the EM*, this also depends on the starting value,
but my (limited) experience has been that
  nor1mix::norMixMLE()
works considerably faster and more reliable than the EM (which I
also provide as    nor1mix::norMixEM() .

Apropos 'starting value': The help page shows how to use
kmeans() for "somewhat" reliable starts; alternatively, I'd
recommend using cluster::pam() to get a start there.

I'm glad to hear about experiences using these / comparing
these with other approaches.

Martin

--
Martin Maechler,
ETH Zurich

    > On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
    > <aanchalsharma833 at gmail.com> wrote:
    >> Hi Simon
    >> 
    >> I am facing same problem as described above. i am trying to fit gaussian
    >> mixture model to my data using normalmixEM. I am running a Rscript which
    >> has this function running as part of it for about 17000 datasets (in loop).
    >> The script runs fine for some datasets, but it terminates when it
    >> encounters one dataset with the following error:
    >> 
    >> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2,  :
    >> Too many tries!
    >> 
    >> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals, lambda =
    >> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000, maxrestarts=200, verb
    >> = TRUE))
    >> (expr_glm_residuals is my dataset which has residual values for different
    >> samples)
    >> 
    >> It is suggested that one should define the mu and sigma in the command by
    >> looking at your dataset. But in my case there are many datasets and it will
    >> keep on changing every time. please suggest what can I do to resolve this
    >> issue.
    >> 
    >> Regards
    >> Anchal
    >> 
    >> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
    >>> 
    >>> Hi Tjun Kiat Teo,
    >>> 
    >>> you try to fit a Normal mixture to some data. The Normal mixture is very
    >>> delicate when it comes to parameter search: If the variance gets closer and
    >>> closer to zero, the log Likelihood becomes larger and larger for any values
    >>> of the remaining parameters. Furthermore for the EM algorithm it is known,
    >>> that it takes sometimes very long until convergence is reached.
    >>> 
    >>> Try the following:
    >>> 
    >>> Use as starting values for the component parameters:
    >>> 
    >>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data, na.rm = TRUE) *
    >>> runif(K)
    >>> 
    >>> For the weights just use either 1/K or the R cluster function with K
    >>> clusters
    >>> 
    >>> Here K is the number of components. Further enlarge the maximum number of
    >>> iterations. What you could also try is to randomize start parameters and
    >>> run an SEM (Stochastic EM). In my opinion the better method is in this case
    >>> a Bayesian method: MCMC.
    >>> 
    >>> 
    >>> Best
    >>> 
    >>> Simon
    >>> 
    >>> 
    >>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com
    >>> <javascript:>> wrote:
    >>> 
    >>> > I was trying to use the normixEM in mixtools and I got this error
    >>> message.
    >>> >
    >>> > And I got this error message
    >>> >
    >>> > One of the variances is going to zero;  trying new starting values.
    >>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too many
    >>> tries!
    >>> >
    >>> > Are there any other packages for fitting mixture distributions  ?
    >>> >
    >>> >
    >>> > Tjun Kiat Teo
    >>> >
    >>> >         [[alternative HTML version deleted]]
    >>> >
    >>> > ______________________________________________
    >>> > R-h... at r-project.org <javascript:> mailing list
    >>> > https://stat.ethz.ch/mailman/listinfo/r-help
    >>> > PLEASE do read the posting guide
    >>> http://www.R-project.org/posting-guide.html
    >>> > and provide commented, minimal, self-contained, reproducible code.
    >>> 
    >>> ______________________________________________
    >>> R-h... at r-project.org <javascript:> mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>> PLEASE do read the posting guide
    >>> http://www.R-project.org/posting-guide.html
    >>> and provide commented, minimal, self-contained, reproducible code.
    >>> 
    >> ______________________________________________
    >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    >> and provide commented, minimal, self-contained, reproducible code.

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.