[R] Very slow optim()

Sat Mar 13 12:56:08 CET 2021

On Sat, Mar 13, 2021 at 4:33 PM Spencer Graves
<spencer.graves using effectivedefense.org> wrote:
>
> Hi, Deepayan:
>
>
> On 2021-03-13 01:27, Deepayan Sarkar wrote:
> > On Sat, Mar 13, 2021 at 10:08 AM Spencer Graves
> > <spencer.graves using effectivedefense.org> wrote:
> >>
> >> TWO COMMENTS:
> >>
> >>
> >> 1.  DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <-
> >> optim(...)"?  If yes and if "optim" terminated normally, the 60,000+
> >> paramters should be there as est$par.  See the documentation on "optim".
> >>
> >>
> >> 2.  WHAT PROBLEM ARE YOU TRYING TO SOLVE?
> >>
> >>
> >>            I hope you will forgive me for being blunt (or perhaps bigoted), but
> >> I'm skeptical about anyone wanting to use optim to estimate 60,000+
> >> parameters.  With a situation like that, I think you would be wise to
> >> recast the problem as one in which those 60,000+ parameters are sampled
> >> from some hyperdistribution characterized by a small number of
> >> hyperparameters.  Then write a model where your observations are sampled
> >> from distribution(s) controlled by these random parameters.  Then
> >> multiply the likelihood of the observations by the likelihood of the
> >> hyperdistribution and integrate out the 60,000+ parameters, leaving only
> >> a small number hyperparameters.
> >
> > Just a comment on this comment: I think it's perfectly reasonable to
> > optimize 60k+ parameters with conjugate gradient. CG was originally
> > developed to solve linear equations of the form Ax=b. If x was not
> > large in size, one would just use solve(A, b) instead of an iterative
> > method.
> >
> > Use of CG is quite common in image processing. A relatively small
> > 300x300 image will give you 90k parameters.
> >
> > -Deepayan
> >
>
>           Thanks for this.
>
>
>           If both A and b are 300x300, then x will also be 300x300.

Sorry for being unclear: the images themselves (b or x) are viewed as
a vector, so would be 90000x1. A would be 90000x90000, so essentially
impossible to construct. CG can solve Ax=b as long as Ax can be
evaluated (for arbitrary x).

>           What do you do in this case if A is not square or even ill conditioned?

A has to be p.d.

>           Do you care if you get only one of many possible or approximate
> solutions, and the algorithm spends most of its time making adjustments
> in a singular subspace that would have best been avoided?

Well, in my experience, optim(method="CG") behaves quite badly if A is
not full-rank. I don't think other implementations will be any better,
but I am not sure.

If you are interested in practical uses of this, we can discuss more off-list.

-Deepayan

>           Spencer
>
>
>