[R] suggestions for nls error: false convergence

Sun Dec 18 21:16:48 CET 2005

	  I generally prefer "optim" to "nlminb", because it will optionally 
return the hessian, and a review of the eigenvalues and vectors can help 
diagnose problems like this.  When I tried "optim" with your "func", I 
got an error message:

 > est. <- optim( c(277, 100,101, 10), func,
+                hessian=TRUE)
Error in optim(c(277, 100, 101, 10), func, hessian = TRUE) :
	objective function in optim evaluates to length 100 not 1

	  For both "optim" and "nlminb", the second argument is a "function to 
be minimized", which must return a scalar.  Your "func" returned a 
vector of length 100 = length(x);  "nlminb" tried to minimize only the 
first component of that vector without reference to y!

	  I therefore modified your "func" as follows and tried it with 
"nlminb" and with three methods of "optim":

func1 <- function( par,y, x ) {
a = par[1]
m = par[2]
n = par[3]
tau = par[4]
y. <- a * (1+m*exp(-x/tau)) / (1+n*exp(-x/tau))
sum((y-y.)^2)
}

est.n <- nlminb( c(277, 100,101, 10), objective=func1,
control=list(eval.max=400, iter.max=1000), y=y, x=x)

est.o1 <- optim( c(277, 100,101, 10), func1,
                hessian=TRUE, y=y, x=x)
est.o2 <- optim( c(277, 100,101, 10), func1,
        method="BFGS", hessian=TRUE, y=y, x=x)
est.o3 <- optim( c(277, 100,101, 10), func1,
        method="CG", hessian=TRUE, y=y, x=x)
est.o4 <- optim( c(277, 100,101, 10), func1,
        method="SANN", hessian=TRUE, y=y, x=x)

	  These 5 optimizers found the following "minima", returning 
"convergence" messages as follows:

nlminb:  $objective=808.7282;  $convergence=0, "relative convergence (4)"

optim(method="Nelder-Mead"(default)):  $value=9032.814 (11 times 
nlminb);  $convergence=1 (the iteration limit 'maxit' had been reached.)

optim(method="BFGS"):  $value=1189625 (1471 times nlminb); 
$convergence=0 ("successful convergence")

optim(method="CG"):  $value=1189574 (1471 times nlminb);  $convergence=1 
(the iteration limit 'maxit' had been reached.)

optim(method="SANN"):  $value=42819.06 (53 times nlminb); 
$convergence=0 ("successful convergence")

	  Clearly, nlminb got the smallest residual sum of squares.  To get the 
hessian there, I fed the nlminb output into optim as follows:

est.no1 <- optim(est.n$par, func1,
                hessian=TRUE, y=y, x=x)
est.no.hess.eig <- eigen(est.no1$hessian, symmetric=TRUE)
signif(est.no.hess.eig$values, 2)
[1]  6.9e+05  3.2e+01  1.6e-04 -1.0e-07

	  Since the smallest eigenvalue is negative, we might think that the 
hessian is indefinite.  However, the smallest eigenvalue is less than 
1e-12 times the largest in absolute value, which really means that the 
smallest eigenvalue is essentially zero.  Moreover, since the third 
eigenvalue is less than 1e-9 times the largest, the true rank of the 
hessian is at most 2.  If we also compare the first and second 
eigenvalues, we that the first is over 2000 times the second.  This says 
that with your model and your data, you really can only estimate one 
parameter;  everything else looks like noise, to me.

	  The eigenvectors provide more detail:

 > round(est.no.hess.eig$vectors, 2)
       [,1] [,2]  [,3] [,4]
[1,] -0.01 1.00  0.00 0.00
[2,]  0.00 0.00  1.00 0.01
[3,]  0.00 0.00 -0.01 1.00
[4,]  1.00 0.01  0.00 0.00
 >
	  This says that your data can estimate the fourth parameter very well, 
as the largest eigenvalue is associated almost exclusively with that 
parameter.  You might also be able to estimate the second parameter 
somewhat, as it is associated almost exclusively with the second 
eigenvalue.  However, you have to move the second and third parameters 
much more than the first and fourth to see much change in the residual 
sum of squares.

	  For more information, I suggest you consult the following:

Seber and Wild (1988) Nonlinear Regression (Wiley;  reprinted in 2003, 
esp. ch. 7 on "Growth Models")

Bates and Watts (1988) Nonlinear Regression Analysis and Its 
Applications (Wiley, esp. ch. 6 on "Graphical Summaries")

	  hope this helps.
	  spencer graves	

Rajarshi Guha wrote:

> Hi,
>   I'm trying to fit some data using a logistic function defined as
> 
> y ~ a * (1+m*exp(-x/tau)) / (1+n*exp(-x/tau)
> 
> My data is below:
> 
> x <- 1:100
> 
> y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,1,1,1,2,2,2,2,2,3,4,4,4,5,
> 5,5,5,6,6,6,6,6,8,8,9,9,10,13,14,16,19,21,
> 24,28,33,40,42,44,50,54,69,70,93,96,110,127,127,141,157,169,
> 178,187,206,216,227,236,238,244,246,250,255,255,257,260,261,262,266,268,
> 268,270,272,272,272,273,275,275,275,276)
> 
> My first attempt was to use nls as below:
> 
>     d <- data.frame(x=x, y=y)
>     model <- nls(y ~ a * (1+m*exp(-x/tau)) / (1+n*exp(-x/tau)), data=d,
>     start=list(a=277,m=100,n=101,tau=10), 
>     algorithm='port', trace=TRUE,
>     control=nls.control(maxiter=5000, minFactor=1/2048))
> 
> Running the above code I get the following error message:
> 
> Convergence failure: function evaluation limit reached without
> convergence (9).
> 
> To investigate this further I used nlminb() to get a set of starting
> parameters. Thus I did:
> 
> func <- function( par ) {
>     a = par[1]
>     m = par[2]
>     n = par[3]
>     tau = par[4]
>     a * (1+m*exp(-x/tau)) / (1+n*exp(-x/tau))
> }
> est <- nlminb( c(277, 100,101, 10), objective=func,
> control=list(eval.max=400, iter.max=1000))
> 
> I get absolute convergence and a set of parameter values. Plugging these
> into the nls call and trying again still gives me
> 
>  Convergence failure: function evaluation limit reached without
> convergence (9)
> 
> I have tried a number of different starting values for the nls() call
> but I usually end up getting the following error:
> 
> Convergence failure: false convergence (8)
> 
> After reading the PORT library docs, I see that this error can mean
> 
> 1) gradient is calculated incorrectly
> 2) stopping tolerances are too tight
> 3) gradient is discontinous near some iterate
> 
> However, since nls() usually reports the above error after 30 to 40
> iterations, the PORT docs suggest that it is not problem 1. I'm not sure
> about (3) - I have other data which are somewhat similar to the above
> data, but they lead to a straightforward fit.
> 
> 
> 
> In the end I tried a different starting value and lowered the tolerances
> a little, and I got a valid fit
> 
> My questions are:
> 
> 1) Why would the parameters that lead nlminb() to converge to work in
> nls() (since I'm using the PORT algorithm in both cases)?
> 
> 2) Is there a strategy to obtain starting values? In my case I know that
> a should be around 277, but for the others I'm not sure. 
> 
> 3) Is there a quick way to check whether the gradient is discontinous at
> a point, numerically (rather than calculating the analytical
> derivatives)? I did 
> 
> plot(diff(y))
> 
> and it certainly looks messy, but I also have other y vectors which look
> equally jagged (though the jaggedness occurs at lower x)
> 
> Any suggestions would be appreciated.
> 
> Thanks,
> 
> -------------------------------------------------------------------
> Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net>
> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
> -------------------------------------------------------------------
> After a number of decimal places, nobody gives a damn.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915