[Rd] Wrongly converging glm()

Fri Jul 21 22:36:21 CEST 2017

“So, what I learned the hard way was  termination due to reasonable stopping  criteria DOES NOT NECESSARILY EQUAL OPTIMAL.”

Yes, I agree, Mark.

Let me add another observation.  In the “optimx” package, John Nash and I implemented a check for optimality conditions – first and second order KKT conditions.  This involves checking whether the gradient is sufficiently small and the Hessian is positive definite (for local minimum) at the final parameter values.  However, it can be quite time consuming to compute these quantities and in some problems checking KKT can take up more effort than finding the solution!  Furthermore, it is difficult to come up with good thresholds for determining “small” gradient and “positive definite” Hessian, since these can depend upon the scale of the objective function and the parameters.

Ravi

From: Mark Leeds [mailto:markleeds2 at gmail.com]
Sent: Friday, July 21, 2017 3:09 PM
To: Ravi Varadhan <ravi.varadhan at jhu.edu>
Cc: Therneau, Terry M., Ph.D. <therneau at mayo.edu>; r-devel at r-project.org; jorismeys at gmail.com; westra.harmjan at outlook.com
Subject: Re: [Rd] Wrongly converging glm()

Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining the cause
of the termination. The codes returned were fine. The problem was that
the model I was using could have multiple solutions ( regardless of the data
sent in ) so, even though the stopping criteria was reached, it turned out that one of the parameters ( there were two parameters ) could have really been anything and the same likelihood value would  be returned. So, what I learned the hard way was  termination due to reasonable stopping  criteria DOES NOT NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for a long time and only happened to notice it when playing around with the likelihood by fixing the offending parameter to various values and optimizing over the
non-offending parameter. Thanks for eloquent explanation.
                                                                                  Mark

On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan <ravi.varadhan at jhu.edu<mailto:ravi.varadhan at jhu.edu>> wrote:
Please allow me to add my 3 cents.  Stopping an iterative optimization algorithm at an "appropriate" juncture is very tricky.  All one can say is that the algorithm terminated because it triggered a particular stopping criterion.  A good software will tell you why it stopped - i.e. the stopping criterion that was triggered.  It is extremely difficult to make a failsafe guarantee that the triggered stopping criterion is the correct one and that the answer obtained is trustworthy. It is up to the user to determine whether the answer makes sense.  In the case of maximizing a likelihood function, it is perfectly reasonable to stop when the algorithm has not made any progress in increasing the log likelihood.  In this case, the software should print out something like "algorithm terminated due to lack of improvement in log-likelihood."  Therefore, I don't see a need to issue any warning, but simply report the stopping criterion that was applied to terminate the algorithm.

Best,
Ravi

-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org<mailto:r-devel-bounces at r-project.org>] On Behalf Of Therneau, Terry M., Ph.D.
Sent: Friday, July 21, 2017 8:04 AM
To: r-devel at r-project.org<mailto:r-devel at r-project.org>; Mark Leeds <markleeds2 at gmail.com<mailto:markleeds2 at gmail.com>>; jorismeys at gmail.com<mailto:jorismeys at gmail.com>; westra.harmjan at outlook.com<mailto:westra.harmjan at outlook.com>
Subject: Re: [Rd] Wrongly converging glm()
I'm chiming in late since I read the news in digest form, and I won't copy the entire conversation to date.

The issue raised comes up quite often in Cox models, so often that the Therneau and Grambsch book has a section on the issue (3.5, p 58).  After a few initial iterations the offending coefficient will increase by a constant at each iteration while the log-likelihood approaches an asymptote (essentially once the other coefficients "settle down").

The coxph routine tries to detect this case and print a warning, and this turns out to be very hard to do accurately.  I worked hard at tuning the threshold(s) for the message several years ago and finally gave up; I am guessing that the warning misses > 5% of the cases when the issue is true, and that 5% of the warnings that do print are incorrect.
(And these estimates may be too optimistic.)   Highly correlated predictors tend to trip
it up, e.g., the truncated power spline basis used by the rcs function in Hmisc.

All in all, I am not completely sure whether the message does more harm than good.  I'd be quite reluctant to go down the same path again with the glm function.

Terry Therneau
______________________________________________
R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

	[[alternative HTML version deleted]]