[R] convergence=0 in optim and nlminb is real?

Tue Dec 17 22:54:03 CET 2013

It was not my suggestion that an optimizer should check the Hessian on  
every occasion (this would be both time consuming and meaningless),  
but I expected it to do so before claiming that a point is at a  
minimum, that is, only for the candidate final point.

Neither I have ever thought that nonlinear optimization is a cursory  
operation, especially when the dimensionality is not small. Exactly  
for this reason I expect that an optimizer takes stringent precautions  
before claiming to have completed its job successfully.

AA

On 17 Dec 2013, at 18:18, Prof J C Nash (U30A) wrote:

> As indicated, if optimizers check Hessians on every occasion, R would
> enrich all the computer manufacturers. In this case it is not too  
> large
> a problem, so worth doing.
>
> However, for this problem, the Hessian is being evaluated by doing
> numerical approximations to second partial derivatives, so the Hessian
> may be almost a fiction of the analytic Hessian. I've seen plenty of
> Hessian approximations that are not positive definite, when the  
> answers
> were OK.
>
> That Inf is allowed does not mean that it is recommended. R is very
> tolerant of many things that are not generally good ideas. That can be
> helpful for some computations, but still cause trouble. It seems  
> that it
> is not the problem here.
>
> I did not look at all the results for this problem from optimx, but it
> appeared that several results were lower than the optim(BFGS) one. Is
> any of the optimx results acceptable? Note that optimx DOES offer to
> check the KKT conditions, and defaults to doing so unless the  
> problem is
> large. That was included precisely because the optimizers generally
> avoid this very expensive computation. But given the range of results
> from the optimx answers using "all methods", I'd still want to do a  
> lot
> of testing of the results.
>
> This may be a useful case to point out that nonlinear optimization is
> not a calculation that should be taken for granted. It is much less
> reliable than most users think. I rarely find ANY problem for which  
> all
> the optimx methods return the same answer. You really do need to  
> look at
> the answers and make sure that they are meaningful.
>
> JN
>
> On 13-12-17 11:32 AM, Adelchi Azzalini wrote:
>> On Tue, 17 Dec 2013 08:27:36 -0500, Prof J C Nash (U30A) wrote:
>>
>> PJCN> If you run all methods in package optimx, you will see results
>> PJCN> all over the western hemisphere. I suspect a problem with some
>> PJCN> nasty computational issues. Possibly the replacement of the
>> PJCN> function with Inf when any eigenvalues < 0 or  nu < 0 is one
>> PJCN> source of this.
>>
>> A value Inf is allowed, as indicated in this passage from the
>> documentation of optim:
>>
>>  Function fn can return NA or Inf if the function cannot be evaluated
>>  at the supplied value, but the initial value must have a computable
>>  finite value of fn.
>>
>> Incidentally, the documentation of optimx includes the same sentence.
>>
>> However, this aspect is not crucial anyway, since the point  
>> selected by
>> optim is within the feasible space (by a good margin), and  
>> evaluation of
>> the Hessian matrix occurs at this point.
>>
>> PJCN>
>> PJCN> Note that Hessian eigenvalues are not used to determine
>> PJCN> convergence in optimization methods. If they did, nobody would
>> PJCN> ever get promoted from junior lecturer who was under 100 if  
>> they
>> PJCN> needed to do this, because determining the Hessian from just  
>> the
>> PJCN> function requires two levels of approximate derivatives.
>>
>> At the end of the optimization process, when a point is going to be
>> declared a minimum point, I expect that an optimizer  checks that it
>> really *is* a minimum. It may do this in other ways other than
>> computing the eigenvalues, but it must be done somehow. Actually, I
>> first realized the problem by attempting inversion (to get standard
>> errors) under the assumption of positive definiteness, and it failed.
>> For instance
>>
>>  mnormt:::pd.solve(opt$hessian)
>>
>> says  "x appears to be not positive definite". This check does not
>> involve a further level of approximation.
>>
>> PJCN>
>> PJCN> If you want to get this problem reliably solved, I think you  
>> will
>> PJCN> need to
>> PJCN> 1) sort out a way to avoid the Inf values -- can you constrain
>> PJCN> the parameters away from such areas, or at least not use Inf.
>> PJCN> This messes up the gradient computation and hence the  
>> optimizers
>> PJCN> and also the final Hessian.
>> PJCN> 2) work out an analytic gradient function.
>> PJCN>
>>
>> In my ealier message, I have indicated that this is a semplified
>> version of the real thing, which is function mst.mle of pkg 'sn'.
>> What mst.mle does is exactly what you indicated, that is, it
>> re-parameterizes the problem so that we always stay within the
>> feasible region and works with analytic gradient function (of the
>> transformed parameters). The final outcome is the same: we land on
>> the same point.
>>
>> However, once the (supposed) point of minimum has been found, the
>> Hessian matrix must be computed on the original parameterization,
>> to get standard errors.
>>
>> Adelchi Azzalini
>>
>> PJCN>
>> PJCN>
>> PJCN> > Date: Mon, 16 Dec 2013 16:09:46 +0100
>> PJCN> > From: Adelchi Azzalini <azzalini at stat.unipd.it>
>> PJCN> > To: r-help at r-project.org
>> PJCN> > Subject: [R] convergence=0 in optim and nlminb is real?
>> PJCN> > Message-ID:
>> PJCN> > <20131216160946.91858ff279db26bd65e187bc at stat.unipd.it>
>> PJCN> > Content-Type: text/plain; charset=US-ASCII
>> PJCN> >
>> PJCN> > It must be the case that this issue has already been rised
>> PJCN> > before, but I did not manage to find it in past posting.
>> PJCN> >
>> PJCN> > In some cases, optim() and nlminb() declare a successful
>> PJCN> > convergence, but the corresponding Hessian is not
>> PJCN> > positive-definite.  A simplified version of the original
>> PJCN> > problem is given in the code which for readability is placed
>> PJCN> > below this text.  The example is built making use of package
>> PJCN> > 'sn', but this is only required to set-up the example: the
>> PJCN> > question is about the outcome of the optimizers. At the end  
>> of
>> PJCN> > the run, a certain point is declared to correspont to a  
>> minimum
>> PJCN> > since 'convergence=0' is reported, but the eigenvalues of the
>> PJCN> > (numerically evaluated) Hessian matrix at that point are not
>> PJCN> > all positive.
>> PJCN> >
>> PJCN> > Any views on the cause of the problem? (i) the point does not
>> PJCN> > correspong to a real minimum, (ii) it does dive a minimum but
>> PJCN> > the Hessian matrix is wrong, (iii) the eigenvalues are not
>> PJCN> > right. ...and, in case, how to get the real solution.
>> PJCN> >
>> PJCN> >
>> PJCN> > Adelchi Azzalini
>> PJCN>
>>
>