# [R] optim seems to be finding a local minimum

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Thu Nov 17 17:00:13 CET 2011

```One more thing: trying to defend R's honor, I've run optimx instead of
optim (after dividing the IV by its max - same as for optim). I did
not use L-BFGS-B with lower bounds anymore. Instead, I've used
First, it was faster: for a loop across 10 different IVs BFGS took
6.14 sec and Nelder-Mead took just 3.9 sec.
Second, the solution was better - Nelder-Mead fits were ALL better
than L-BFGS-B fits and ALL better than Excel solver's solutions. Of
course, those were small improvements, but still, it's nice!
Dimitri

On Mon, Nov 14, 2011 at 5:26 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Just to provide some closure:
>
> I ended up dividing the IV by its max so that the input vector (IV) is
> now between zero and one. I still used optim:
> myopt <- optim(fn=myfunc, par=c(1,1), method="L-BFGS-B", lower=c(0,0))
> I was able to get great fit, in 3 cases out of 10 I've beaten Excel
> Solver, but in 7 cases I lost to Excel - but again, by really tiny
> margins (generally less than 1% of Excel's fit value).
>
> Thank you everybody!
> Dimitri
>
> On Fri, Nov 11, 2011 at 10:28 AM, John C Nash <nashjc at uottawa.ca> wrote:
>> Some tips:
>>
>> 1) Excel did not, as far as I can determine, find a solution. No point seems to satisfy
>> the KKT conditions (there is a function kktc in optfntools on R-forge project optimizer.
>> It is called by optimx).
>>
>> 2) Scaling of the input vector is a good idea given the seeming wide range of values. That
>> is, assuming this can be done. If the function depends on the relative values in the input
>> vector rather than magnitude, this may explain the trouble with your function. That is, if
>> the function depends on the relative change in the input vector and not its scale, then
>> optimizers will have a lot of trouble if the scale factor for this vector is implicitly
>> one of the optimization parameters.
>>
>> 3) If you can get the gradient function you will almost certainly be able to do better,
>> especially in finding whether you have a minimum i.e., null gradient, positive definite
>> Hessian. When you have gradient function, kktc uses Jacobian(gradient) to get the Hessian,
>> avoiding one level of digit cancellation.
>>
>> JN
>>
>>
>> On 11/11/2011 10:20 AM, Dimitri Liakhovitski wrote:
>>> Thank you very much to everyone who replied!
>>> As I mentioned - I am not a mathematician, so sorry for stupid
>>> I intuitively understand what you mean by scaling. While the solution
>>> space for the first parameter (.alpha) is relatively compact (probably
>>> between 0 and 2), the second one (.beta) is "all over the place" -
>>> because it is a function of IV (input vector). And that's, probably,
>>> my main challenge - that I am trying to write a routine for different
>>> possible IVs that I might be facing (they may be in hundreds, in
>>> thousands, in millions). Should I be rescaling the IV somehow (e.g.,
>>> by dividing it by its max) - or should I do something with the
>>> parameter .beta inside my function?
>>>
>>> So far, I've written a loop over many different starting points for
>>> both parameters. Then, I take the betas around the best solution so
>>> far, split it into smaller steps for beta (as starting points) and
>>> optimize again for those starting points. What disappoints me is that
>>> even when I found a decent solution (the minimized value of 336) it
>>> was still worse than the Solver solution!
>>>
>>> And I am trying to prove to everyone here that we should do R, not Excel :-)
>>>
>>> Thanks again for your help, guys!
>>> Dimitri
>>>
>>>
>>> On Fri, Nov 11, 2011 at 9:10 AM, John C Nash <nashjc at uottawa.ca> wrote:
>>>> I won't requote all the other msgs, but the latest (and possibly a bit glitchy) version of
>>>> optimx on R-forge
>>>>
>>>> 1) finds that some methods wander into domains where the user function fails try() (new
>>>> optimx runs try() around all function calls). This includes L-BFGS-B
>>>>
>>>> 2) reports that the scaling is such that you really might not expect to get a good solution
>>>>
>>>> then
>>>>
>>>> 3) Actually gets a better result than the
>>>>
>>>>> xlf<-myfunc(c(0.888452533990788,94812732.0897449))
>>>>> xlf
>>>> [1] 334.607
>>>>>
>>>>
>>>> with Kelley's variant of Nelder Mead (from dfoptim package), with
>>>>
>>>>> myoptx
>>>>  method                        par       fvalues fns  grs itns conv  KKT1
>>>> 4 LBFGSB                     NA, NA 8.988466e+307  NA NULL NULL 9999    NA
>>>> 2 Rvmmin           0.1, 200186870.6      25593.83  20    1 NULL    0 FALSE
>>>> 3 bobyqa 6.987875e-01, 2.001869e+08      1933.229  44   NA NULL    0 FALSE
>>>> 1   nmkb 8.897590e-01, 9.470163e+07      334.1901 204   NA NULL    0 FALSE
>>>>   KKT2 xtimes  meths
>>>> 4    NA   0.01 LBFGSB
>>>> 2 FALSE   0.11 Rvmmin
>>>> 3 FALSE   0.24 bobyqa
>>>> 1 FALSE   1.08   nmkb
>>>>
>>>> But do note the terrible scaling. Hardly surprising that this function does not work. I'll
>>>> have to delve deeper to see what the scaling setup should be because of the nature of the
>>>> function setup involving some of the data. (optimx includes parscale on all methods).
>>>>
>>>> However, original poster DID include code, so it was easy to do a quick check. Good for him.
>>>>
>>>> JN
>>>>
>>>>> ## Comparing this solution to Excel Solver solution:
>>>>> myfunc(c(0.888452533990788,94812732.0897449))
>>>>>
>>>>> -- Dimitri Liakhovitski marketfusionanalytics.com
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>

--
Dimitri Liakhovitski
marketfusionanalytics.com

```