[R] optim bug/help?

Thu Oct 23 22:55:54 CEST 2008

Hi Kevin,

There is no substitute to knowing your problem as well as you possibly can
(I know this can be challenging in complicated, real problems).  If you plot
the contours of the function, you will see that they are banana shaped
around the solution, i.e. the objective function is not very sensitive to
changes along the "length" of the banana, but changes rapidly along the
perpendicular direction.  A concise way of saying all this is to say that
the problem is ill-conditioned.  A consequence of such ill-conditioning is
that the numerical solution becomes highly sensitive to perturbations in
"data", where the term "data" includes all the parameters (related to both
the function and the numerical algorithm) that are required to solve the
problem.  The step-size for computing finite-difference approximation of
gradient is a datum that has a significant impact on the solution, as Ben
had pointed out, because of this ill-conditioning.  Starting value is
another crucial piece of data, as Patrick had pointed out.

In the Rosenbrock example, the problem is due to the scaling factor "100" in
the Rosenbrock function.  Increase this, and the problem becomes more
severe.  

One way to get a sense of the degree of ill-conditioning is to look at the
ratio of the largest to smallest eigen value of the inverse of the Hessian
matrix.

 ans <- optim(c(-1,1), fr, grr, method = "BFGS", hessian=TRUE) 
 eigen( solve(ans$hess) )$val

Hope this is helpful,
Ravi.

----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

----------------------------------------------------------------------------
--------

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of rkevinburton at charter.net
Sent: Thursday, October 23, 2008 4:03 PM
To: r-help at stat.math.ethz.ch; Ben Bolker
Subject: Re: [R] optim bug/help?

I was just trying to get a feel for the general strengths and weaknesses of
the algoritmns. This is obviously a "demo" function and probably not
representative of the "real" optimization problems that I will face. My
concern was that if there is inaccuracy when I know the answer it might be
worse when I don't if I don't understand the characteristics of the
algoritmns involved.

Kevin

---- Ben Bolker <bolker at ufl.edu> wrote: 
>  <rkevinburton <at> charter.net> writes:
> 
> > 
> > In the documentation for 'optim' it gives the following function:
> > 
> > fr <- function(x) {   ## Rosenbrock Banana function
> >     x1 <- x[1]
> >     x2 <- x[2]
> >     100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } optim(c(-1.2,1), fr)
> > 
> > When I run this code I get:
> > 
> > $par
> > [1] 1.000260 1.000506
> > 
> > I am sure I am missing something but why isn't 1,1 a better answer? 
> > If I plug
> 1,1 in the function it seems that
> > the function is zero. Whereas the answer given gives a function 
> > result of
> 8.82e-8. This was after 195 calls
> > to the function (as reported by optim). The documentation indicates 
> > that the
> 'reltol' is about 1e-8. Is
> > this the limit that I am bumping up against?
> > 
> > Kevin
> > 
> 
>   Yes, this is basically just numeric fuzz, the bulk of which probably 
> comes from finite-difference evaluation of the derivative.
> As demonstrated below, you can get a lot closer by defining an 
> analytic gradient function.
> May I ask why this level of accuracy is important?
> (Insert Tukey quotation here about approximate answers to the right 
> question here ...)
> 
>   Ben Bolker
> ---------
> 
> fr <- function(x) {   ## Rosenbrock Banana function
>     x1 <- x[1]
>     x2 <- x[2]
>     100 * (x2 - x1 * x1)^2 + (1 - x1)^2 }
> 
> ## gradient function
> frg <- function(x) {
>   x1 <- x[1]
>   x2 <- x[2]
>   c(100*(4*x1^3-2*x2*2*x1)-2+2*x1,
>     100*(2*x2-2*x1^2))
> }
> 
> ## use numericDeriv to double-check my calculus
> x1 <- 1.5
> x2 <- 1.7
> numericDeriv(quote(fr(c(x1,x2))),c("x1","x2"))
> frg(c(x1,x2))
> 
> ##
> optim(c(-1.2,1), fr)  ## Nelder-Mead
> optim(c(-1.2,1), fr,method="BFGS")
> optim(c(-1.2,1), fr, gr=frg,method="BFGS")  ## use gradient
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.