[R] An optim() mystery.

Tue Nov 15 21:50:52 CET 2005

I have a Master's student working on a project which involves
estimating parameters of a certain model via maximum likelihood,
with the maximization being done via optim().

A phenomenon has occurred which I am at a loss to explain.

If we use certain pairs of starting values for optim(), it
simply returns those values as the ``optimal'' values, although
they are definitely not optimal.

With other starting values, optim() goes off and finds a reasonable,
optimum, and seems to do so consistently --- i.e. it gets essentially
the same estimates irrespective of starting values.

We have plotted the log likelihood surface and it appears smooth
and relatively innocuous.

The phenomenon only occurs with the "L-BFGS-B"; the default
(Nelder-Mead simplex) method, with a heavy penalty for violating
constraints, seems to work just fine.  So we can get solutions;
it just makes me uneasy that there's this funny going on.

Can anyone shed any light on what the problem is?  I have enclosed
below code to reproduce the phenomenon.  It is probably advisable
to save it in two separate parts:  The first part does some
setting up - creating the data and defining the function to be
optimized.  The second part consists of various calls to optim().
Note that the code ***minimizes*** minus twice the log likelihood.

Be aware that the calls to optim() take quite a while to execute;
like 40 seconds on my laptop.

Eternal gratitude for any words of wisdom on this matter!

BTW, the recalcitrant parameter pair, used in the script below, is
the pair of values which were used to simulate the data ``resi''.  So
they are the ``correct'' values.  But optim() can't know that unless
magic is loose in the world!  Anyhow, there are other pairs, e.g.
c(0.04,0.15),that result in the same behaviour.

				cheers,

					Rolf Turner
					rolf at math.unb.ca

P. S.  The help on optim() says the Nelder-Mead method is
``relatively slow''.  However for this problem optim() seems to come
in about 10 seconds when Nelder-Mead is used, as opposed to about 40
when "L-BFGS-B" is used and runs successfully.

					R. T.

#==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===
# Script #1:
#==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===

#
# Preliminary stuff, to set things up:
#

resi <- c(0.0603646889,0.1363603779,0.1523925363,0.2145877518,0.1440157480,
          0.0844998235,-0.1297248347,-0.0552557629,0.0481005640,0.0754532293,
          0.2119027608,0.1851255809,-0.0380954844,-0.0118705151,-0.1094305971,
         -0.0862933436,-0.0275531242,-0.0626877959,-0.0923180186,-0.2725409915,
         -0.2733103385,-0.2806762169,-0.3713376801,-0.4138431049,-0.3037150478,
         -0.3347483204,-0.1598387374,-0.1869063413,-0.1734677727,-0.1610389684,
         -0.2239938894,-0.2186303230,-0.1686406340,-0.2381844523,-0.2115563556,
         -0.3134484958,-0.2570423412,-0.1447961173,0.0321965240,0.0073261698,
         -0.0559209305,-0.0219575807,0.0073407870,0.2049894668,0.0997092007,
          0.2281661594,0.1474324246,0.2203087759,0.1902474412,0.2909445296,
          0.1869771602,0.1673737821,0.0644645886,0.1107366215,0.0222997552,
         -0.0202454896,0.0458960824,0.0082990521,0.0069920731,-0.0400939212,
         -0.1366096522,-0.1103078421,-0.0539834505,-0.0002650046,0.0163391466,
         -0.0713795007,0.1324350576,0.1937169442,0.3131844274,0.3325063151,
          0.4348740892,0.4551717728,0.4049963331,0.4482280578,0.2167847952,
          0.2168151604,0.2233674124,0.2472087180,0.1983512503,0.1823975372,
          0.0483989112,-0.0421543344,-0.0526425804,-0.0590416757,-0.0406552389,
          0.1728161804,0.2281930355,0.2136595821,0.0647976863,0.0781707139,
          0.0860909779,-0.0236073305,0.0817706100,0.1411715048,0.1305741598,
          0.2838075993,0.1642797706,0.1328162057,0.3099161357,0.1859591497)

theta <- seq(0,2*pi,length=101)[-101]

foo <- function(pars,res,theta) {
		eps <- sqrt(.Machine$double.eps)
		big <- sqrt(.Machine$double.xmax)
                k   <- pars[1]
                rho <- pars[2]
		# This line is needed in the Nelder-Mead case:
                if(k < 0 | rho < 0 | rho > 1) return(big)
                Sigma <- k*rho^outer(theta,theta,cdist)
                udv   <- svd(Sigma)
                d     <- udv$d
                if(min(d) < eps) return(big)
                w <- t(udv$u)%*%res
		# Minus twice the log likelihood:
                sum(log(d)) + sum(w^2/d)
        }

cdist <- function(theta1,theta2) {
        d <- abs(theta1-theta2)%%(2*pi)
        ifelse(d>pi,2*pi-d,d)
}

#==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===
# Script #2:
#==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===

#
# Demonstration of the problem.
#

# Using the ``bad'' starting values:
pars <- c(0.06,0.25)
R1 <- optim(pars,foo,res=resi,theta=theta, method="L-BFGS-B",
      lower=c(0,0),upper=c(Inf,1))
print(R1)

# Using roughly (rather badly!) estimated starting values:
pars <- c(var(resi),cor(resi[-1],resi[-100]))
R2 <- optim(pars,foo,res=resi,theta=theta, method="L-BFGS-B",
      lower=c(0,0),upper=c(Inf,1))
print(R2)

# Using other starting values:
pars <- c(0.05,0.5)
R3 <- optim(pars,foo,res=resi,theta=theta, method="L-BFGS-B",
      lower=c(0,0),upper=c(Inf,1))
print(R3)

# Using Nelder-Mead and the ``bad'' starting values.
pars <- c(0.06,0.25)
R4 <- optim(pars,foo,res=resi,theta=theta)
print(R4)