# [R] Problem with numerical integration and optimization with BFGS

Deepankar Basu basu.15 at osu.edu
Fri May 25 16:41:00 CEST 2007

Ravi,

Thanks a lot for your detailed suggestions. I will certainly look at the
links that you have sent and the package "mnormt". For the moment, I
have managed to analytically integrate the expression using "pnorm"
along the lines suggested by Prof. Ripley yesterday.

For instance, my first integral becomes the following:

f1 <- function(w1,w0) {

a <- 1/(2*(1-rho2^2)*sigep^2)
b <- (rho2*(w1-w0+delta))/((1-rho2^2)*sigep*sigeta)
c <- ((w1-w0+delta)^2)/(2*(1-rho2^2)*sigeta^2)
d <- muep
k <- 2*pi*sigep*sigeta*(sqrt(1-rho2^2))

b1 <- ((-2*a*d - b)^2)/(4*a) - a*d^2 - b*d - c
b21 <- sqrt(a)*(w1-rho1*w0) + (-2*a*d - b)/(2*sqrt(a))
b22 <- sqrt(a)*(-w1+rho1*w0) + (-2*a*d - b)/(2*sqrt(a))
b31 <- 2*pnorm(b21*sqrt(2)) - 1  # ERROR FUNCTION
b32 <- 2*pnorm(b22*sqrt(2)) - 1  # ERROR FUNCTION
b33 <- as.numeric(w1-rho1*w0>=0)*(b31-b32)

return(sqrt(pi)*(1/(2*k*sqrt(a)))*exp(b1)*b33)
}

for (i in 2:n) {

out1 <- f1(y[i],y[i-1])

}

I have worked out similar expressions for the other two integrals also.

Deepankar

On Fri, 2007-05-25 at 09:56 -0400, Ravi Varadhan wrote:
> Deepankar,
>
> If the problem seems to be in the evaluation of numerical quadrature part,
> you might want to try quadrature methods that are better suited to
> even their adaptive versions such as Gauss-Kronrod, are not best suited for
> integrating because they do not explicitly account for the "peakedness" of
> the integrand, and hence can be inefficient and inaccurate. See the article
> below:
> http://citeseer.ist.psu.edu/cache/papers/cs/18996/http:zSzzSzwww.sci.wsu.edu
> zSzmathzSzfacultyzSzgenzzSzpaperszSzmvn.pdf/genz92numerical.pdf
>
> Alan Genz has worked on this problem a lot and has a number of computational
> tools available. I used some of them when I was working on computing Bayes
> factors for binomial regression models with different link functions.  If
> you are interested, check the following:
>
> http://www.math.wsu.edu/faculty/genz/software/software.html.
>
> For your immediate needs, there is an R package called "mnormt" that has a
> function for computing integrals under a multivariate normal (and
> multivariate t) densities, which is actually based on Genz's Fortran
> routines.  You could try that.
>
> Ravi.
>
>
> ----------------------------------------------------------------------------
> -------
>
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
>
>
>
>
> ----------------------------------------------------------------------------
> --------
>
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Deepankar Basu
> Sent: Friday, May 25, 2007 12:02 AM
> To: Prof Brian Ripley
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Problem with numerical integration and optimization with
> BFGS
>
> Prof. Ripley,
>
> The code that I provided with my question of course does not contain
> code for the derivatives; but I am supplying analytical derivatives in
> my full program. I did not include that code with my question because
> new information relevant for my question. The problem that I had pointed
> to occurs whether I provide analytical derivatives or not to the
> optimization routine. And the problem was that when I use the "BFGS"
> method in optim, I get an error message saying that the integrals are
> probably divergent; I know, on the other hand, that the integrals are
> convergent. The same problem does not arise when I instead use the
>
> Your suggestion that the expression can be analytically integrated
> (which will involve "pnorm") might be correct though I do not see how to
> do that. The integrands are the bivariate normal density functions with
> one variable replaced by known quantities while I integrate over the
> second.
>
> For instance, the first integral is as follows: the integrand is the
> bivariate normal density function (with general covariance matrix) where
> the second variable has been replaced by
> y[i] - rho1*y[i-1] + delta
> and I integrate over the first variable; the range of integration is
> lower=-y[i]+rho1*y[i-1]
> upper=y[i]-rho1*y[i-1]
>
> The other two integrals are very similar. It would be of great help if
> you could point out how to integrate the expressions analytically using
> "pnorm".
>
> Thanks.
> Deepankar
>
>
> On Fri, 2007-05-25 at 04:22 +0100, Prof Brian Ripley wrote:
> > You are trying to use a derivative-based optimization method without
> > supplying derivatives.  This will use numerical approoximations to the
> > derivatives, and your objective function will not be suitable as it is
> > internally using adaptive numerical quadrature and hence is probably not
> > close enough to a differentiable function (it may well have steps).
> >
> > I believe you can integrate analytically (the answer will involve pnorm),
> > and that you can also find analytical derivatives.
> >
> > Using (each of) numerical optimization and integration is a craft, and it
> > seems you need to know more about it.  The references on ?optim are too
> > references.
> >
> > On Thu, 24 May 2007, Deepankar Basu wrote:
> >
> > > Hi R users,
> > >
> > > I have a couple of questions about some problems that I am facing with
> > > regard to numerical integration and optimization of likelihood
> > > functions. Let me provide a little background information: I am trying
> > > to do maximum likelihood estimation of an econometric model that I have
> > > developed recently. I estimate the parameters of the model using the
> > > monthly US unemployment rate series obtained from the Federal Reserve
> > > Bank of St. Louis. (The data is freely available from their web-based
> > > database called FRED-II).
> > >
> > > For my model, the likelihood function for each observation is the sum of
> > > three integrals. The integrand in each of these integrals is of the
> > > following form:
> > >
> > > A*exp(B+C*x-D*x^2)
> > >
> > > where A, B, C and D are constants, exp() is the exponential function and
> > > x is the variable of integration. The constants A and D are always
> > > positive; B is always negative, while there is no a priori knowledge
> > > about the sign of C. All the constants are finite.
> > >
> > > Of the three integrals, one has finite limits while the other two have
> > > the following limits:
> > >
> > > lower = -Inf
> > > upper = some finite number (details can be found in the code below)
> >
> > Try integrating that analytically by change of variable to a normal CDF.
> >
> >
> > > My problem is the following: when I try to maximize the log-likelihood
> > > function using "optim" with method "BFGS", I get the following error
> > > message (about the second integral):
> > >
> > >> out <- optim(alpha.start, LLK, gr=NULL, method="BFGS", y=urate\$y)
> > > Error in integrate(f3, lower = -Inf, upper = upr2) :
> > >        the integral is probably divergent
> > >
> > > Since I know that all the three integrals are convergent, I do not
> > > understand why I am getting this error message. My first question: can
> > > someone explain what mistake I am making?
> > >
> > > What is even more intriguing is that when I use the default method
> > > (Nelder-Mead) in "optim" instead of BFGS, I do not get any such error
> > > message. Since both methods (Nelder-Mead and BFGS) will need to evaluate
> > > the integrals, my second question is: why this difference?
> > >
> > > Below, I am providing the code that I use. Any help will be greatly
> > > appreciated.
> > >
> > >
> > > Deepankar
> > >
> > >
> > > ************ CODE START *******************
> > >
> > >
> > >
> > > #############################
> > > # COMPUTING THE LOGLIKELIHOOD
> > > # USING NUMERICAL INTEGRALS
> > > #############################
> > >
> > > LLK <- function(alpha, y) {
> > >
> > >  n <- length(y)
> > >  lglik <- numeric(n) # TO BE SUMMED LATER TO GET THE LOGLIKELIHOOD
> > >
> > >           lambda <- numeric(n-1)    # GENERATING *lstar*
> > >           for (i in 1:(n-1)) {      # TO USE IN THE
> > >           lambda[i] <- y[i+1]/y[i]  # RE-PARAMETRIZATION BELOW
> > >           }
> > >           lstar <- (min(lambda)-0.01)
> > >
> > >
> > > # NOTE RE-PARAMETRIZATION
> > > # THESE RESTRICTIONS EMERGE FROM THE MODEL
> > >
> > >  muep <- alpha[1]                                      # NO RESTRICTION
> > >  sigep <-  0.01 + exp(alpha[2])                        # greater than
> > > 0.01
> > >  sigeta <- 0.01 + exp(alpha[3])                        # greater than
> > > 0.01
> > >  rho2 <- 0.8*sin(alpha[4])                             # between -0.8
> > > and 0.8
> > >  rho1 <- lstar*abs(alpha[5])/(1+abs(alpha[5]))         # between 0 and
> > > lstar
> > >  delta <- 0.01 + exp(alpha[6])                         # greater than
> > > 0.01
> > >
> > >
> > > ##########################################
> > > # THE THREE FUNCTIONS TO INTEGRATE
> > > # FOR COMPUTING THE LOGLIKELIHOOD
> > > ##########################################
> > >
> > >  denom <- 2*pi*sigep*sigeta*(sqrt(1-rho2^2)) # A CONSTANT TO BE USED
> > >                                              # FOR DEFINING THE
> > >                                              # THREE FUNCTIONS
> > >
> > >
> > >  f1 <- function(z1) {  # FIRST FUNCTION
> > >
> > >       b11 <- ((z1-muep)^2)/((-2)*(1-rho2^2)*(sigep^2))
> > >       b12 <-
> > > (rho2*(z1-muep)*(y[i]-y[i-1]+delta))/((1-rho2^2)*sigep*sigeta)
> > >       b13 <- ((y[i]-y[i-1]+delta)^2)/((-2)*(1-rho2^2)*(sigeta^2))
> > >
> > >     return((exp(b11+b12+b13))/denom)
> > >  }
> > >
> > >  f3 <- function(z3) { # SECOND FUNCTION
> > >
> > >       b31 <- ((y[i]-rho1*y[i-1]-muep)^2)/((-2)*(1-rho2^2)*(sigep^2))
> > >       b32 <-
> > > (rho2*(y[i]-rho1*y[i-1]-muep)*z3)/((1-rho2^2)*sigep*sigeta)
> > >       b33 <- ((z3)^2)/((-2)*(1-rho2^2)*(sigeta^2))
> > >
> > >     return((exp(b31+b32+b33))/denom)
> > >  }
> > >
> > >  f5 <- function(z5) { # THIRD FUNCTION
> > >
> > >       b51 <- ((-y[i]+rho1*y[i-1]-muep)^2)/((-2)*(1-rho2^2)*sigep^2)
> > >       b52 <-
> > > (rho2*(-y[i]+rho1*y[i-1]-muep)*(z5))/((1-rho2^2)*sigep*sigeta)
> > >       b53 <- ((z5)^2)/((-2)*(1-rho2^2)*(sigeta^2))
> > >
> > >     return((exp(b51+b52+b53))/denom)
> > >  }
> > >
> > >
> > >  for (i in 2:n) {   # START FOR LOOP
> > >
> > >        upr1 <- (y[i]-rho1*y[i-1])
> > >        upr2 <- (y[i]-y[i-1]+delta)
> > >
> > >     # INTEGRATING THE THREE FUNCTIONS
> > >      out1 <- integrate(f1, lower = (-1)*upr1, upper = upr1)
> > >      out3 <- integrate(f3, lower = -Inf, upper = upr2)
> > >      out5 <- integrate(f5, lower= -Inf, upper = upr2)
> > >
> > >       pdf <- (out1\$val + out3\$val + out5\$val)
> > >
> > >     lglik[i] <- log(pdf) # LOGLIKELIHOOD FOR OBSERVATION i
> > >
> > >     }               # END FOR LOOP
> > >
> > > return(-sum(lglik)) # RETURNING NEGATIVE OF THE LOGLIKELIHOOD
> > >                     # BECAUSE optim DOES MINIMIZATION BY DEFAULT
> > > }
> > >
> > > ***************** CODE ENDS *********************************
> > >
> > > Then I use:
> > >
> > >> alpha.start <- c(0.5, -1, -1, 0, 1, -1) # STARTING VALUES
> > >> out <- optim(alpha.start, LLK, gr=NULL, y=urate\$y) # THIS GIVES NO
> > > ERROR
> > >
> > > or
> > >
> > >> out <- optim(alpha.start, LLK, gr=NULL, method="BFGS", y=urate\$y)
> > > Error in integrate(f3, lower = -Inf, upper = upr2) :
> > >        the integral is probably divergent
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help