[R] convergence=0 in optim and nlminb is real?

Tue Dec 17 23:03:34 CET 2013

I actually agree with the sentiments below -- the optimizer should
support its claims. The reality is sadly otherwise, in my view largely
because of the difficulties in computing the Hessian.

This exchange has been useful, as it highlights user expectations.
Without such dialog, we won't improve our R tools.

JN

On 13-12-17 04:54 PM, Adelchi Azzalini wrote:
> It was not my suggestion that an optimizer should check the Hessian on
> every occasion (this would be both time consuming and meaningless), but
> I expected it to do so before claiming that a point is at a minimum,
> that is, only for the candidate final point.
> 
> Neither I have ever thought that nonlinear optimization is a cursory
> operation, especially when the dimensionality is not small. Exactly for
> this reason I expect that an optimizer takes stringent precautions
> before claiming to have completed its job successfully.
> 
> AA
> 
> 
> 
> On 17 Dec 2013, at 18:18, Prof J C Nash (U30A) wrote:
> 
>> As indicated, if optimizers check Hessians on every occasion, R would
>> enrich all the computer manufacturers. In this case it is not too large
>> a problem, so worth doing.
>>
>> However, for this problem, the Hessian is being evaluated by doing
>> numerical approximations to second partial derivatives, so the Hessian
>> may be almost a fiction of the analytic Hessian. I've seen plenty of
>> Hessian approximations that are not positive definite, when the answers
>> were OK.
>>
>> That Inf is allowed does not mean that it is recommended. R is very
>> tolerant of many things that are not generally good ideas. That can be
>> helpful for some computations, but still cause trouble. It seems that it
>> is not the problem here.
>>
>> I did not look at all the results for this problem from optimx, but it
>> appeared that several results were lower than the optim(BFGS) one. Is
>> any of the optimx results acceptable? Note that optimx DOES offer to
>> check the KKT conditions, and defaults to doing so unless the problem is
>> large. That was included precisely because the optimizers generally
>> avoid this very expensive computation. But given the range of results
>> from the optimx answers using "all methods", I'd still want to do a lot
>> of testing of the results.
>>
>> This may be a useful case to point out that nonlinear optimization is
>> not a calculation that should be taken for granted. It is much less
>> reliable than most users think. I rarely find ANY problem for which all
>> the optimx methods return the same answer. You really do need to look at
>> the answers and make sure that they are meaningful.
>>
>> JN
>>
>> On 13-12-17 11:32 AM, Adelchi Azzalini wrote:
>>> On Tue, 17 Dec 2013 08:27:36 -0500, Prof J C Nash (U30A) wrote:
>>>
>>> PJCN> If you run all methods in package optimx, you will see results
>>> PJCN> all over the western hemisphere. I suspect a problem with some
>>> PJCN> nasty computational issues. Possibly the replacement of the
>>> PJCN> function with Inf when any eigenvalues < 0 or  nu < 0 is one
>>> PJCN> source of this.
>>>
>>> A value Inf is allowed, as indicated in this passage from the
>>> documentation of optim:
>>>
>>>  Function fn can return NA or Inf if the function cannot be evaluated
>>>  at the supplied value, but the initial value must have a computable
>>>  finite value of fn.
>>>
>>> Incidentally, the documentation of optimx includes the same sentence.
>>>
>>> However, this aspect is not crucial anyway, since the point selected by
>>> optim is within the feasible space (by a good margin), and evaluation of
>>> the Hessian matrix occurs at this point.
>>>
>>> PJCN>
>>> PJCN> Note that Hessian eigenvalues are not used to determine
>>> PJCN> convergence in optimization methods. If they did, nobody would
>>> PJCN> ever get promoted from junior lecturer who was under 100 if they
>>> PJCN> needed to do this, because determining the Hessian from just the
>>> PJCN> function requires two levels of approximate derivatives.
>>>
>>> At the end of the optimization process, when a point is going to be
>>> declared a minimum point, I expect that an optimizer  checks that it
>>> really *is* a minimum. It may do this in other ways other than
>>> computing the eigenvalues, but it must be done somehow. Actually, I
>>> first realized the problem by attempting inversion (to get standard
>>> errors) under the assumption of positive definiteness, and it failed.
>>> For instance
>>>
>>>  mnormt:::pd.solve(opt$hessian)
>>>
>>> says  "x appears to be not positive definite". This check does not
>>> involve a further level of approximation.
>>>
>>> PJCN>
>>> PJCN> If you want to get this problem reliably solved, I think you will
>>> PJCN> need to
>>> PJCN> 1) sort out a way to avoid the Inf values -- can you constrain
>>> PJCN> the parameters away from such areas, or at least not use Inf.
>>> PJCN> This messes up the gradient computation and hence the optimizers
>>> PJCN> and also the final Hessian.
>>> PJCN> 2) work out an analytic gradient function.
>>> PJCN>
>>>
>>> In my ealier message, I have indicated that this is a semplified
>>> version of the real thing, which is function mst.mle of pkg 'sn'.
>>> What mst.mle does is exactly what you indicated, that is, it
>>> re-parameterizes the problem so that we always stay within the
>>> feasible region and works with analytic gradient function (of the
>>> transformed parameters). The final outcome is the same: we land on
>>> the same point.
>>>
>>> However, once the (supposed) point of minimum has been found, the
>>> Hessian matrix must be computed on the original parameterization,
>>> to get standard errors.
>>>
>>> Adelchi Azzalini
>>>
>>> PJCN>
>>> PJCN>
>>> PJCN> > Date: Mon, 16 Dec 2013 16:09:46 +0100
>>> PJCN> > From: Adelchi Azzalini <azzalini at stat.unipd.it>
>>> PJCN> > To: r-help at r-project.org
>>> PJCN> > Subject: [R] convergence=0 in optim and nlminb is real?
>>> PJCN> > Message-ID:
>>> PJCN> > <20131216160946.91858ff279db26bd65e187bc at stat.unipd.it>
>>> PJCN> > Content-Type: text/plain; charset=US-ASCII
>>> PJCN> >
>>> PJCN> > It must be the case that this issue has already been rised
>>> PJCN> > before, but I did not manage to find it in past posting.
>>> PJCN> >
>>> PJCN> > In some cases, optim() and nlminb() declare a successful
>>> PJCN> > convergence, but the corresponding Hessian is not
>>> PJCN> > positive-definite.  A simplified version of the original
>>> PJCN> > problem is given in the code which for readability is placed
>>> PJCN> > below this text.  The example is built making use of package
>>> PJCN> > 'sn', but this is only required to set-up the example: the
>>> PJCN> > question is about the outcome of the optimizers. At the end of
>>> PJCN> > the run, a certain point is declared to correspont to a minimum
>>> PJCN> > since 'convergence=0' is reported, but the eigenvalues of the
>>> PJCN> > (numerically evaluated) Hessian matrix at that point are not
>>> PJCN> > all positive.
>>> PJCN> >
>>> PJCN> > Any views on the cause of the problem? (i) the point does not
>>> PJCN> > correspong to a real minimum, (ii) it does dive a minimum but
>>> PJCN> > the Hessian matrix is wrong, (iii) the eigenvalues are not
>>> PJCN> > right. ...and, in case, how to get the real solution.
>>> PJCN> >
>>> PJCN> >
>>> PJCN> > Adelchi Azzalini
>>> PJCN>
>>>
>>
>