[R] fitting a mixture of distributions with optim and max log likelihood ? Clarifications

Bob Sandefur Bob.Sandefur at pincock.com
Wed Aug 29 00:10:00 CEST 2001


Hi-
 Professor Ripley raised some questions on my first email:
 Are you on XP?
 I am on Release candidate 1 from MSDN professional subscriptions,
actually.
Anyone (although perhaps only in US) can
  get about the same thing from the MicroSoft Preview program for about
$20
US. (This release is time limited). cygwin works
 for me so far but I only use the shell, shell scripts, awk and sed on a
regular basis (gcc and g77 are okay as far as I have used
them) and the cygwin installer happly updated my windows XP  verison

What were warnings?
lWarning messages:
1: NaNs produced in: log(x)
...
22: NaNs produced in: log(x)

I really want is to fit mixtures of 2 or 3 lognormal distributions that
are
discrete to the nearest 0.001 (gold assay
values in ounces/ton) with both a normal error of about + or - 0.003 and
a
proportional error of about + or - 5%
constrained to be non negative so I think analytical derivatives are out
the
window.  

Could you point me to a reference on log-sds?

thanx

bob

Thread follows:




>>> Prof Brian D Ripley 08/28/01 03:10PM >>>
On Tue, 28 Aug 2001, Bob Sandefur wrote:

> hi
>  Suppose I have a mixture of 2 distributions generated by

This is well-known difficult optimization problem with many local
maxima.
I think in particular that your test example has too broad a second
normal for this to work well.

The default method for optim() is slow, not very precise but very
reliable.
I very rarely use it.  With a problem like this where derivatives are
readily available I would BFGS with analytical derivatives and multiple
starting points to check for local maxima.

BTw, what are those warnings?  You actually have a constrained
space for the optimization, and I would have worked with log-sds
to avoid that.

>
> rtwonormals <- function(npnt,m1,s1,m2,s2,p2){
>  rv<-vector(npnt,mode="numeric")
>  for( i in seq(1:npnt)){
>   if(runif(1,0,1)<=p2){
>    rv[i]<-rnorm(1,m2,s2)
>    }
>   else{
>    rv[i]<-rnorm(1,m1,s1)
>   }
>  }
>  return(rv)
>  }
> x <- rtwonormals(50000,0,100,500,500,0.05)
>
> #and I try to fit these with  (based on thread:  [R] Estimating
Weibull
Distribution Parameters - very basic question)
>
> loglike<-function(p)
-2*sum(log((1-p[5])*dnorm(x,p[1],p[2])+p[5]*dnorm(x,p[3],p[4])))
> optim(c(-20,150,400,600,.035),loglike)
> optim(c(20,70,600,400,.095),loglike)
> optim(c(0,100,500,500,.05),loglike)
> optim(c(-20,150,400,600,.035),loglike)
>
> # three different starting values (1 and 4 the same to check
reproducablity)
I get:
>
> Version 1.3.0  (2001-06-22) on windoze XP

Interesting.  I thought XP was to be released on Oct 25, but there were
rumours that MicroSoft had changed this.  Do you really have XP, or
a preview?

If this really does run under the real XP, we should add that to the
readme etc.

Quite a few things, including our Cygwin tools, do not run under XP,
allegedly.

> > optim(c(-20,150,400,600,.035),loglike)
> $par
> [1]   1.28597210 100.53443070 550.06070497 615.06936388   0.04563778
> $value
> [1] 622843.1
> $counts
> function gradient
>      493       NA
> $convergence
> [1] 0
> $message
> NULL
> There were 22 warnings (use warnings() to see them)
>
> > optim(c(20,70,600,400,.095),loglike)
> $par
> [1]   0.62742812 100.15891023 533.25825184 514.63882147   0.04670099
> $value
> [1] 622692.7
> $couns
> function gradient
>      501       NA
> $convergence
> [1] 1    < OPPS
> $message
> NULL
> There were 22 warnings (use warnings() to see them)
>
> > optim(c(0,100,500,500,.05),loglike)
> $par
> [1]   0.56254342 100.03881574 499.47434961 505.59785487   0.04805347
> $value
> [1] 622685
> $counts
> function gradient
>      109       NA
> $convergence
> [1] 0
> $message
> NULL
> There were 21 warnings (use warnings() to see them)
>
> > optim(c(-20,150,400,600,.035),loglike)
> $par
> [1]   1.28597210 100.53443070 550.06070497 615.06936388   0.04563778
> $value
> [1] 622843.1
> $counts
> function gradient
>      493       NA
> $convergence
> [1] 0
> $message
> NULL
> There were 22 warnings (use warnings() to see them)
>
>
> Questions:
> 1) Did I mess up anything in the formulae?
> 2) Any suggestions for converging to the same value?
> 3) Any suggestions for other methods to get means, stddevs and
proportions
of the mixture of distributions?
>
> Thanx
>
> Robert (Bob) L Sandefur
> Principal Geostatistician
> Pincock Allen & Holt (A Hart Crowser Company)
>  International Minerals Consultants
>  274 Union Suite 200
>  Lakewood CO 80228
>  USA
>  303 914-4467  v
>  303 987-8907 f
> rls at pincock.com 
>
>
> ~
> ~
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-.-
> r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html 
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To:
r-help-request at stat.math.ethz.ch 
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._._._
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk 
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/ 
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://stat.ethz.ch/pipermail/r-help/attachments/20010828/916998c7/attachment.html


More information about the R-help mailing list