# [R] problem with convergence in mle2/optim function

Bert Gunter gunter.berton at gene.com
Fri Oct 5 08:51:27 CEST 2012

```Presumably you checked out the CRAN Optimization task view to see if
you could find any joy there, right? (I doubt there is, but ...)

-- Bert

On Thu, Oct 4, 2012 at 10:12 PM, Adam Zeilinger <zeil0006 at umn.edu> wrote:
> Hello R Help,
>
> I am trying solve an MLE convergence problem: I would like to estimate four
> parameters, p1, p2, mu1, mu2, which relate to the probabilities, P1, P2, P3,
> of a multinomial (trinomial) distribution.  I am using the mle2() function
> and feeding it a time series dataset composed of four columns: time point,
> number of successes in category 1, number of successes in category 2, and
> number of success in category 3.  The column headers are: t, n1, n2, and n3.
>
> The mle2() function converges occasionally, and I need to improve the rate
> of convergence when used in a stochastic simulation, with multiple
> stochastically generated datasets.  When mle2() does not converge, it
> returns an error: "Error in optim(par = c(2, 2, 0.001, 0.001), fn = function
> (p) : L-BFGS-B needs finite values of 'fn'."  I am using the L-BFGS-B
> optimization method with a lower box constraint of zero for all four
> parameters.  While I do not know any theoretical upper limit(s) to the
> parameter values, I have not seen any parameter estimates above 2 when using
> empirical data.  It seems that when I start with certain 'true' parameter
> values, the rate of convergence is quite high, whereas other "true"
> parameter values are very difficult to estimate.  For example, the true
> parameter values p1 = 2, p2 = 2, mu1 = 0.001, mu2 = 0.001 causes convergence
> problems, but the parameter values p1 = 0.3, p2 = 0.3, mu1 = 0.08, mu2 =
> 0.08 lead to high convergence rate.  I've chosen these two sets of values
> because they represent the upper and lower estimates of parameter values
> derived from graphical methods.
>
> First, do you have any suggestions on how to improve the rate of convergence
> and avoid the "finite values of 'fn'" error?  Perhaps it has to do with the
> true parameter values being so close to the boundary?  If so, any
> suggestions on how to estimate parameter values that are near zero?
>
> Here is reproducible and relevant code from my stochastic simulation:
>
> ########################################################################
> library(bbmle)
> library(combinat)
>
> # define multinomial distribution
> dmnom2 <- function(x,prob,log=FALSE) {
>   r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
>   if (log) r else exp(r)
> }
>
> # vector of time points
> tv <- 1:20
>
> # Negative log likelihood function
> NLL.func <- function(p1, p2, mu1, mu2, y){
>   t <- y\$tv
>   n1 <- y\$n1
>   n2 <- y\$n2
>   n3 <- y\$n3
>   P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
>     4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
>     mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
>     4*(mu2*p1 + mu1*(mu2 + p2))) -
>     exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
>     mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
>     2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>     4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
>     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
>     exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>     4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
>     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
>   P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
>     4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
>     mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
>     4*(mu2*p1 + mu1*(mu2 + p2))) -
>     exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
>     mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
>     2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>     4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
>     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
>     exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>     4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
>     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
>   P3 <- 1 - P1 - P2
>   p.all <- c(P1, P2, P3)
>   -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE))
> }
>
> ## Generate simulated data
> # Model equations as expressions,
> P1 <- expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
>   4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
>   mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
>   4*(mu2*p1 + mu1*(mu2 + p2))) -
>   exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
>   mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
>   2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>   4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
>   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
>   exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>   4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
>   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
>
> P2 <- expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
>   4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
>   mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
>   4*(mu2*p1 + mu1*(mu2 + p2))) -
>   exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
>   mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
>   2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>   4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
>   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
>   exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>   4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
>   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
>
> # True parameter values
> p1t = 2; p2t = 2; mu1t = 0.001; mu2t = 0.001
>
> # Function to calculate probabilities from 'true' parameter values
> psim <- function(x){
>   params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = x)
>   eval.P1 <- eval(P1, params)
>   eval.P2 <- eval(P2, params)
>   P3 <- 1 - eval.P1 - eval.P2
>   c(x, matrix(c(eval.P1, eval.P2, P3), ncol = 3))
> }
> pdat <- sapply(tv, psim, simplify = TRUE)
> Pdat <- as.data.frame(t(pdat))
> names(Pdat) <- c("time", "P1", "P2", "P3")
>
> # Generate simulated data set from probabilities
> n = rep(20, length(tv))
> p = as.matrix(Pdat[,2:4])
> y <- as.data.frame(rmultinomial(n,p))
> yt <- cbind(tv, y)
> names(yt) <- c("tv", "n1", "n2", "n3")
>
> # mle2 call
> mle.fit <- mle2(NLL.func, data = list(y = yt),
>                 start = list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t),
>                 control = list(maxit = 5000, factr = 1e-10, lmm = 17),
>                 method = "L-BFGS-B", skip.hessian = TRUE,
>                 lower = list(p1 = 0, p2 = 0, mu1 = 0, mu2 = 0))
>
> ###########################################################################
>
> I interpret the error as having to do with the finite difference
> approximation failing.  If so, perhaps a gradient function would help? If
> you agree, I've described my unsuccessful attempt at writing a gradient
> function below.  If a gradient function is unnecessary, ignore the remainder
> of this message.
>
> My gradient function: I derived the gradient function by taking the
> derivative of my NLL equation with respect to each parameter.  My NLL
> equation is the probability mass function of the trinomial distribution.
> Thus the gradient equation for, say, parameter p1 would be:
>
> gr.p1 <- deriv(log(P1^n1), p1) + deriv(log(P2^n2), p1) + deriv(log(P3^n3),
> p1)
>
> This produces a very large equation, which I won't reproduce here. Let's say
> that the four gradient equations for the four parameters are defined as
> gr.p1, gr.p2, gr.mu1, gr.mu2, and all are derived as described above for
> gr.p1.  These gradient equations are functions of p1, p2, mu1, mu2, t, n1,
> n2, and n3.  My current gradient function is:
>
> grr <- function(p1, p2, mu1, mu2, y){
>   t <- y[,1]
>   n1 <- y[,2]
>   n2 <- y[,3]
>   n3 <- y[,4]
>   gr.p1 <- .......
>   gr.p2 <- .......
>   gr.mu1 <- .......
>   gr.mu2 <- .......
>   c(gr.p1, gr.p2, gr.mu1, gr.mu2)
> }
>
> The problem is that I need to supply values for t, n1, n2, and n3 to the
> gradient function, which are from the dataset yt, above.  When I supply the
> dataset yt, the function produces a vector of length 4*nrow(yt) = 80.  When
> I include it in my mle2() function, I get an error that mle2 (optim)
> requires a vector of length 4.  How do I write my gradient function to work
> in mle2()?
>
> Any help would be much appreciated.
>
>
> --
> Post Doctoral Scholar
> Department of Entomology
> University of California Riverside
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.

--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website: