[R] Very slow optim()
deep@y@n@@@rk@r @end|ng |rom gm@||@com
Sat Mar 13 12:56:08 CET 2021
On Sat, Mar 13, 2021 at 4:33 PM Spencer Graves
<spencer.graves using effectivedefense.org> wrote:
> Hi, Deepayan:
> On 2021-03-13 01:27, Deepayan Sarkar wrote:
> > On Sat, Mar 13, 2021 at 10:08 AM Spencer Graves
> > <spencer.graves using effectivedefense.org> wrote:
> >> TWO COMMENTS:
> >> 1. DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <-
> >> optim(...)"? If yes and if "optim" terminated normally, the 60,000+
> >> paramters should be there as est$par. See the documentation on "optim".
> >> 2. WHAT PROBLEM ARE YOU TRYING TO SOLVE?
> >> I hope you will forgive me for being blunt (or perhaps bigoted), but
> >> I'm skeptical about anyone wanting to use optim to estimate 60,000+
> >> parameters. With a situation like that, I think you would be wise to
> >> recast the problem as one in which those 60,000+ parameters are sampled
> >> from some hyperdistribution characterized by a small number of
> >> hyperparameters. Then write a model where your observations are sampled
> >> from distribution(s) controlled by these random parameters. Then
> >> multiply the likelihood of the observations by the likelihood of the
> >> hyperdistribution and integrate out the 60,000+ parameters, leaving only
> >> a small number hyperparameters.
> > Just a comment on this comment: I think it's perfectly reasonable to
> > optimize 60k+ parameters with conjugate gradient. CG was originally
> > developed to solve linear equations of the form Ax=b. If x was not
> > large in size, one would just use solve(A, b) instead of an iterative
> > method.
> > Use of CG is quite common in image processing. A relatively small
> > 300x300 image will give you 90k parameters.
> > -Deepayan
> Thanks for this.
> If both A and b are 300x300, then x will also be 300x300.
Sorry for being unclear: the images themselves (b or x) are viewed as
a vector, so would be 90000x1. A would be 90000x90000, so essentially
impossible to construct. CG can solve Ax=b as long as Ax can be
evaluated (for arbitrary x).
> What do you do in this case if A is not square or even ill conditioned?
A has to be p.d.
> Do you care if you get only one of many possible or approximate
> solutions, and the algorithm spends most of its time making adjustments
> in a singular subspace that would have best been avoided?
Well, in my experience, optim(method="CG") behaves quite badly if A is
not full-rank. I don't think other implementations will be any better,
but I am not sure.
If you are interested in practical uses of this, we can discuss more off-list.
More information about the R-help