[R] Safeguarded Newton method for function minimization

Wed Apr 19 16:06:13 CEST 2017

I should have given a more detailed explanation about Automatic Differentiation.
As Martin points out, there are some AD elements in deriv etc., but to my view
they are related more to symbolic differentiation, and not in the same space
as AD Model Builder or its successor whose name eludes me at the moment.
Nor like ADIFOR or ADOLC that can provide the derivative of a subprogram.

However, Martin's example of the Wood function shows what can be done, and this
is a valuable case. Indeed it shows a syntax of which I was previously unaware!
Because it is fairly complicated, I'm not sure I would say "effortlessly", but I
will certainly be copying it. Thanks. Previously, I've done the steps to get the
function partly manually to get to the functional form, as in Chapter 10 of my
Nonlinear Parameter Optimization book.

Note that there are functions all in R that are similar to the deriv() and deriv3()
in package nlsr by Duncan Murdoch and me. There is a facility to extend the derivatives
table to new functions. We can also generate the R gradient functions from expressions.

I'll take this chance to note that the Wood problem is NOT solved by nlm() with
analytic hessian, nor by a safeguarded or unsafeguarded Newton. And I ran a version
of the Fortran code UNCMIN that is the origin of nlm() and got a similar early termination.
But the hessian when these codes halt is not positive definite, so we are outside
the assumed conditions for these methods. Dealing with a non-pos-def hessian in
function minimization has created a small industry in Ph Ds for researchers in
optimization.

Thanks again for Martin's work.

JN

On 2017-04-19 08:56 AM, Martin Maechler wrote:
>>>>>> J C Nash <profjcnash at gmail.com>
>>>>>>     on Tue, 18 Apr 2017 13:32:52 -0400 writes:
>
>     > Recently Marie Boehnstedt reported a bug in the nlm()
>     > function for function minimization when both gradient and
>     > hessian are provided.
> Indeed, on R's Bugzilla here :
> 	https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249
>
>     > She has provided a work-around for some cases and it seems
>     > this will get incorporated into the R function eventually.
>
> indeed.... the first part -- fixing the wrong choldc() -- in the C code has
> been in my version of R-devel for a while now.
>
> See my follow up
>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249#c4  and #c5
>
> including my attachment which is an extended version of Marie's original :
>     https://bugs.r-project.org/bugzilla/attachment.cgi?id=2246
>
> As that mentions _and_ shows: Fixing choldc() solves the problem
> for the 2D Rosenbrook example, but _not_ the  4D Wood example.
>
>     > However, despite the great number of packages on CRAN,
>     > there does not appear to be a straightforward Newton
>     > approach to function minimization. This may be because
>     > providing the code for a hessian (the matrix of second
>     > derivatives) is a lot of work and error-prone.
>
>     > (R could also use some good tools for Automatic Differentiation).
>
> The last part of what you say above is not at all true:
> R -- and S (& S+ | S-PLUS) before it! -- always had deriv() and deriv3()
> and my attachment above _shows_ you how to use  deriv3() to get
> both the gradient and the hessian  via a version of automatic
> differentiation completely effortlessly !!
>
> For ease of readers, that part here, with an example:
>
> ##' Wood function (4 arguments 'x1' ... 'x4')
> fwood <- function(x1,x2,x3,x4) {
>   100*(x1^2-x2)^2 + (1-x1)^2 + 90*(x3^2-x4)^2 + (1-x3)^2 +
>     10.1*((1-x2)^2 + (1-x4)^2) + 19.8*(1-x2)*(1-x4)
> }
> ## automatically construct correct gradient and hessian:
> woodf.gh <- function(x) {
>   stopifnot(is.numeric(x))
>   woodGH <- deriv3(body(fwood)[[2]],
>                    c("x1","x2","x3","x4"), function.arg=TRUE)
>   if(length(x) == 4)
>     woodGH(x[1],x[2],x[3],x[4])
>   else if(is.matrix(x) && ncol(x) == 4)
>     woodGH(x[,1], x[,2], x[,3], x[,4])
>   else stop("'x' must have length 4 or be a matrix with 4 columns")
> }
>
> and now get both the function f(x), gradient g(x) and Hessian H(x)
> for
>      x = (0 0 0 0),
>      x = (1 1 1 1), and
>      x = (1 2 3 4)
>
> with such a simple calle :
>
>   > woodf.gh(rbind(0, 1, 1:4))
>   [1]   42.0    0.0 2514.4
>   attr(,"gradient")
> 	 x1    x2   x3     x4
>   [1,]   -2 -40.0   -2  -40.0
>   [2,]    0   0.0    0    0.0
>   [3,] -400 279.6 5404 -819.6
>   attr(,"hessian")
>   , , x1
>
> 	x1   x2 x3 x4
>   [1,]   2    0  0  0
>   [2,] 802 -400  0  0
>   [3,] 402 -400  0  0
>
>   , , x2
>
> 	 x1    x2 x3   x4
>   [1,]    0 220.2  0 19.8
>   [2,] -400 220.2  0 19.8
>   [3,] -400 220.2  0 19.8
>
>   , , x3
>
>        x1 x2   x3    x4
>   [1,]  0  0    2     0
>   [2,]  0  0  722  -360
>   [3,]  0  0 8282 -1080
>
>   , , x4
>
>        x1   x2    x3    x4
>   [1,]  0 19.8     0 200.2
>   [2,]  0 19.8  -360 200.2
>   [3,]  0 19.8 -1080 200.2
>
>   >
>
>     > I have also noted that a number of researchers try to
>     > implement textbook methods and run into trouble when maths
>     > and computing are not quite in sync. Therefore, I wrote a
>     > simple safeguarded Newton and put a small package on
>     > R-forge at
>
>     > https://r-forge.r-project.org/R/?group_id=395
>
>     > Note that Newton's method is used to solve nonlinear
>     > equations. In fact, for function minimization, we apply it
>     > to solve g(x) = 0 where g is the gradient and x is the
>     > vector of parameters. In part, safeguards ensure we reduce
>     > the function f(x) at each step to avoid some of the
>     > difficulties that may arise from a non-positive-definite
>     > hessian.
>
>     > In the package, I have a very simple quadratic test, the
>     > Rosenbrock test function and the Wood test function. The
>     > method fails on the last function -- the hessian is not
>     > positive definite where it stops.
>
>     > Before submitting this package to CRAN, I would like to
>     > see its behaviour on other test problems, but am lazy
>     > enough to wish to avoid creating the hessian code. If
>     > anyone has such code, it would be very welcome. Please
>     > contact me off-list. If I get some workable examples that
>     > are open for public view, I'll report back here.
>
>     > John Nash
>
>     > ______________________________________________
>     > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>