[Rd] Optimization in R

Sat Aug 4 17:10:29 CEST 2007

Hi Andrew,
I have been working quite a bit with optim and friends on automated
nonlinear fitting, mainly for calibration. All of the optimizers
seem to have trouble in the tricky situation of fitting a log-logistic 
model when the upper asymptote is not well defined, and trying to
estimate a variance
parameter. Granted this is not necessarily the optimizer, but a
combination
of objective function, convergence criteria, scaling, ... 
But my point is that for automated fitting where many starting values,
iterations with
pseudo likelihood for the variance function, over multiple curves, the
overhead of
the function written in R vs C can become non-negligible for "simple"
univariate functions. 
So I have to agree with Duncan, that R is very good for prototyping but
C would be preferable.

I can contribute some "hard" problems if you start the package.

Nicholas
> Date: Sat, 4 Aug 2007 01:12:31 -0400
> From: Andrew Clausen <clausen at econ.upenn.edu>
> Subject: [Rd] Optimization in R
> To: r-devel at r-project.org
> Cc: help-gsl at gnu.org
> Message-ID: <20070804051231.GB3016 at econ.upenn.edu>
> Content-Type: text/plain; charset=unknown-8bit
> 
> Hi all,
> 
> I've been working on improving R's optim() command, which does general
> purpose
> unconstrained optimization.  Obviously, this is important for many
> statistics
> computations, such as maximum likelihood, method of moments, etc.  I have
> focused my efforts of the BFGS method, mainly because it best matches my
> current projects.
> 
> Here's a quick summary of what I've done:
>  * implemented my own version of BFGS in R,
> 	http://www.econ.upenn.edu/~clausen/computing/bfgs.zip
>  * written a wrapper for the GNU Scientific Library's optimization
>  function,
> multimin(),
> 	http://www.econ.upenn.edu/~clausen/computing/multimin.zip
>  * written some tricky functions to compare implementations,
> 	http://www.econ.upenn.edu/~clausen/computing/tests.zip
> 
> My own implementation has several advantages over optim()'s
> implementation
> (which you can see in the vmmin() function in
> 	https://svn.r-project.org/R/trunk/src/main/optim.c)
>  * the linesearch algorithm (More-Thuente) quickly finds a region of
>  interest
> to zoom into.  Moreover, it strikes a much better balance between finding
> a point that adequately improves upon the old point, but doesn't waste
> too
> much time finding a much better point.  (Unlike optim(), it uses the
> standard
> Wolfe conditions with weak parameters.)
>  * the linesearch algorithm uses interpolation, so it finds an acceptable
> point more quickly.
>  * implements "box" constraints.
>  * easier to understand and modify the code, partly because it's written
>  in R.
> 
> Of course, this comes at the (slight?) overhead cost of being written in
> R.
> 
> The test suite above takes the first few functions from the paper
> 
>         Mor??, Garbow, and Hillstrom, "Testing Unconstrained
>         Optimization Software", ACM Trans Math Softw 7:1 (March 1981)
> 
> The test results appear below, where "*" means "computed the right
> solution",
> and "!" means "got stuck".
> 
> test                    optim           clausen         gsl
> --------------------------------------------------------------
> bard                                                    !
> beale
> brown-scaled
> freudenstein-roth
> gaussian                                                *
> helical-valley          *               *
> jennrich-sampson                                        *
> meyer                                                   *
> powell-scaled                           *
> rosenbrock                              *
> 
> The table indiciates that all three implementations of BFGS failed to
> compute
> the right answer in most cases.  I suppose this means they are all quite
> deficient.  Of course, this doesn't imply that they perform badly on real
> statistics problems -- but in my limited experience with my crude
> econometric
> models, they do perform badly.   Indeed, that's why I started
> investigating in
> the first place.
> 
> For what it's worth, I think:
>  * the optimization algorithms should be written in R -- the overhead is
> small compared to the cost of evaluating likelihood functions anyway, and
> is
> easily made up by the better algorithms that are possible.
>  * it would be useful to keep a repository of interesting optimization
> problems relevant to R users.  Then R developers can evaluate
> "improvements".
> 
> Cheers,
> Andrew
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> R-devel at r-project.org mailing list  DIGESTED
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> End of R-devel Digest, Vol 54, Issue 4
> **************************************