[R] MLE for bimodal distribution

Thu Apr 9 00:10:26 CEST 2009

EM algorithm is a better approach for maximum likelihood estimation of finite-mixture models than direct maximization of the mixture log-likelihood.  Due to its ascent properties, it is guaranteed to converge to a local maximum.  By theoretical construction, it also automatically ensures that all the parameter constraints are satisfied.  

The problem with mixture estimation, however, is that the local maximum may not be a good solution.  Even global maximum may not be a good solution.  For example, it is well known that in a Gaussian mixture, you can get a log-likelihood of +infinity by letting mu = one of the data points, and sigma = 0 for one of the mixture components.  You can detect trouble in MLE if you observe one of the following: (1) a degenerate solution, i.e. one of the mixing proportions is close to 0, (2) one of the sigmas is too small.

EM convergence is sensitive to starting values.  Although, there are several starting strategies (see MacLachlan and Basford's book on Finite Mixtures), there is no universally sound strategy for ensuring a good estimate (e.g. global maximum, when it makes sense). It is always a good practice to run EM with multiple starting values.  Be warned that EM convergence can be excruciatingly slow.  Acceleration methods can be of help in large simulation studies or for bootstrapping.

Best,
Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu

----- Original Message -----
From: Bert Gunter <gunter.berton at gene.com>
Date: Wednesday, April 8, 2009 4:14 pm
Subject: Re: [R] MLE for bimodal distribution
To: 'Ben Bolker' <bolker at ufl.edu>, r-help at r-project.org

> The problem is that fitting mixtures models is hard -- (non)identifiability
>  is a serious problem: very large sample sizes may be necessary to clearly
>  distinguish the modes. As V&R say in MASS, 4th edition, p. 320: " ...
>  fitting normal mixtures is a difficult problem, and the results 
> obtained are
>  often heavily dependent on the initial configuration ([i.e. starting 
> values.
>  BG] supplied"
>  
>  The EM algorithm is quite commonly used: you might have a look at 
> em() and
>  related functions in the mclust package if Ben's straightforward approach
>  fails to do the job for you. No guarantee em will work either, of course.
>  
>  -- Bert
>  
>  
>  Bert Gunter
>  Genentech Nonclinical Biostatistics
>  650-467-7374
>  
>  -----Original Message-----
>  From: r-help-bounces at r-project.org [ On
>  Behalf Of Ben Bolker
>  Sent: Wednesday, April 08, 2009 12:47 PM
>  To: r-help at r-project.org
>  Subject: Re: [R] MLE for bimodal distribution
>  
>  
>  
>  
>  _nico_ wrote:
>  > 
>  > Hello everyone,
>  > 
>  > I'm trying to use mle from package stats4 to fit a bi/multi-modal
>  > distribution to some data, but I have some problems with it.
>  > Here's what I'm doing (for a bimodal distribution):
>  > 
>  > # Build some fake binormally distributed data, the procedure fails 
> also
>  > with real data, so the problem isn't here
>  > data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))
>  > # Just to check it's bimodal
>  > plot(density(data))
>  > f = function(m, s, m2, s2, w)
>  > {
>  > -log( sum(w*dnorm(data, mean=m, sd=s)) + sum((1-w)*dnorm(data, mean=m2,
>  > sd=s2)) )
>  > }
>  > 
>  > res = mle(f, start=list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6))
>  > 
>  > This gives an error:
>  > Error in optim(start, f, method = method, hessian = TRUE, ...) : 
>  >   non-finite finite-difference value [2]
>  > In addition: There were 50 or more warnings (use warnings() to see 
> the
>  > first 50)
>  > And the warnings are about dnorm producing NaNs
>  > 
>  > So, my questions are:
>  > 1) What does "non-finite finite-difference value" mean? My guess 
> would be
>  > that an Inf was passed somewhere where a finite number was expected...?
>  > I'm not sure how optim works though...
>  > 2) Is the log likelihood function I wrote correct? Can I use the 
> same type
>  > of function for 3 modes?
>  > 3) What should I do to avoid function failure? I tried by changing 
> the
>  > parameters, but it did not work.
>  > 4) Can I put constraints to the parameters? In particular, I would 
> like w
>  > to be between 0 and 1.
>  > 5) Is there a way to get the log-likelihood value, so that I can compare
>  > different extimations?
>  > 6) Do you have any (possibly a bit "pratically oriented") readings 
> about
>  > MLE to suggest?
>  > 
>  > Thanks in advance
>  > Nico
>  > 
>  
>    Here's some tweaked code that works.
>  Read about the "L-BFGS-B" method in ?optim to constrain parameters,
>  although you will have some difficulty making this work for more than
>  two components.  I think there's also a mixture model problem in
>  Venables & Ripley (MASS).
>  
>    There are almost certainly more efficient approaches for mixture
>  model fitting, although this "brute force" approach should be fine
>  if you don't need to do anything too complicated.
>  (RSiteSearch("mixture model"))
>  
>  set.seed(1001)
>  data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))
>  # Just to check it's bimodal
>  plot(density(data))
>  f = function(m, s, m2, s2, w)
>  {
>    -sum(log(w*dnorm(data, mean=m, sd=s)+
>             (1-w)*dnorm(data, mean=m2, sd=s2)))
>  }
>  
>  library(stats4)
>  start0 <- list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6)
>  do.call("f",start0) ## OK
>  res = mle(f, start=start0)
>  
>  with(as.list(coef(res)),
>       curve(w*dnorm(x,m,s)+(1-w)*dnorm(x,m2,s2),add=TRUE,col=2))
>  
>  
>  -- 
>  View this message in context:
>  
>  Sent from the R help mailing list archive at Nabble.com.
>  
>  ______________________________________________
>  R-help at r-project.org mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.
>  
>  ______________________________________________
>  R-help at r-project.org mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.