[R] problem with convergence in mle2/optim function

Berend Hasselman bhh at xs4all.nl
Mon Oct 8 12:44:05 CEST 2012

See below.

On 08-10-2012, at 07:39, Adam Zeilinger wrote:

> Dear R Help,
>
> Thank you to those who responded to my questions about mle2/optim convergence.  A few responders pointed out that the optim error seems to arise when either one of the probabilities P1, P2, or P3 become negative or infinite.  One suggested examining the exponential terms within the P1 and P2 equations.
>
> I may have made some progress along these lines.  The exponential terms in the equations for P1 and P2 go to infinity at certain (large) values of t. The exponential terms can be found in lines 1, 5, and 7 in the P1 and P2 expressions below.  Here is some example code:
>
> ###########################################################################
> # P1 and P2 equations
> P1 <- expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
>  mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2))) -
>  exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
>  mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
>  2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
>  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
>  exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
>  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
> P2 <- expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
>  mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2))) -
>  exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
>  mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
>  2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
>  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
>  exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
>  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
>
> # Vector of t values
> tv <- c(1:200)
>
> # 'true' parameter values
> p1t = 2; p2t = 2; mu1t = 0.001; mu2t = 0.001
>
> # Function to calculate probabilities from 'true' parameter values
> psim <- function(x){
>  params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = x)
>  eval.P1 <- eval(P1, params)
>  eval.P2 <- eval(P2, params)
>  P3 <- 1 - eval.P1 - eval.P2
>  c(x, matrix(c(eval.P1, eval.P2, P3), ncol = 3))
> }
> pdat <- sapply(tv, psim, simplify = TRUE)
> Pdat <- as.data.frame(t(pdat))
> names(Pdat) <- c("time", "P1", "P2", "P3")
>
> matplot(Pdat[,-1], type = "l", xlab = "time", ylab = "Probability",
>        col = c("black", "brown", "blue"),
>        lty = c(1:3), lwd = 2, ylim = c(0,1))
> legend("topright", c("P1", "P2", "P3"),
>       col = c("black", "brown", "blue"),
>       lty = c(1:3), lwd = 2)
> Pdat[160:180,] # psim function begins to return "NaN" at t = 178
>
> # exponential terms in P1 and P2 expressions are problematic
> params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = 178)
>
> exp1 <- expression(exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2)))*t))
> eval(exp1, params) # returns Inf at t = 178
>
> exp2 <- expression(exp((1/2)*(mu1 + mu2 + p1 + p2 +
>  sqrt((mu1 + mu2 + p1 + p2)^2 -
>  4*(mu2*p1 + mu1*(mu2 + p2))))*t))
> eval(exp2, params) # also returns Inf at t = 178
> ##########################################################################
>
> The time step at which the exponential terms go to infinity depends on the values of the parameters p1, p2, mu1, mu2.  It seems that the convergence problems may be due to these exponential terms going to infinity.  Thus my convergence problem appears to be an overflow problem?
>

It is quite possible that letting t becoming larger and larger is going to get you into trouble here,
But the maximum value of t in your initial post was 20.
So it's unlikely that the value of t was causing the problems with the calculation of the gradient (error message "non-finite finite-difference value [3]").

If you change the lower bounds for mu1 and mu2 to something slightly larger than 0 mle2 converges. Because now optim has enough leeway to calculate a finite difference. At least I think that is the cause of your initial problem.

> # mle2 call
> mle.fit <- mle2(NLL.func, data = list(y = yt),
+                start = list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t),
+                control = list(trace=1,maxit = 5000, factr = 1e-5, lmm = 17),
+                method = "L-BFGS-B", skip.hessian = TRUE,
+                lower = list(p1 = 0, p2 = 0, mu1 = 0.0001, mu2 = 0.0001))
iter    0 value -1110.031664
iter   10 value -1110.315437
iter   20 value -1110.608553
final  value -1110.609914
converged

> mle.fit

Call:
mle2(minuslogl = NLL.func, start = list(p1 = p1t, p2 = p2t, mu1 = mu1t,
mu2 = mu2t), method = "L-BFGS-B", data = list(y = yt), skip.hessian = TRUE,
control = list(trace = 1, maxit = 5000, factr = 1e-05, lmm = 17),
lower = list(p1 = 0, p2 = 0, mu1 = 1e-04, mu2 = 1e-04))

Coefficients:
p1       p2      mu1      mu2
1.471146 1.618261 0.000100 0.000100

Log-likelihood: 1110.61

> Unfortunately, I'm not sure where to go from here.  Due to the complexity of the P1 and P2 equations, there's no clear way to rearrange the equations to eliminate t from the exponential terms.
>
> Does anyone have any suggestions on how to address this problem? Perhaps there is a way to bound p1, p2, mu1, and mu2 to avoid the exponential terms going to infinity?  Or bound P1 and P2?
>

If you were to transform  P1, P2 and P3 to something between  0 and 1 you would be changing the model.

Berend