[R] Help with faster optimization for large parameter problem

James Fowler jhfowler at ucsd.edu
Fri Mar 2 10:33:53 CET 2007


Hello all,

I have a large parameter problem with the following very simple likelihood 
function:

fn<-function(param) {
  x1<-param[1:n]
  g1<-param[(n+1):(2*n)]
  beta<-param[(2*n+1):(2*n+k)]
  sigma2<-param[2*n+k+1]^2
  meang1sp<-mean(g1[sp])
  mu<-beta%*%matrix(x1,1,n)-(g1[sp]-meang1sp)%*%matrix(g1,1,n)
  return(sum((ydc-mu)^2)/(2*sigma2) + n*k*log(sqrt(sigma2)) +
   mean(x1)^2 + mean(g1)^2 + 1000*(x1[1]>x1[n]))
}

There's nothing special here -- it's just plain old OLS, except all the 
variables on the right hand side are parameters (only the variable ydc is 
observed).  I have no problems getting this to recover parameter estimates 
from data I myself have generated for smaller problems (e.g. where ydc is 
a k=500 by n=50 matrix and there are 601 parameters).  But the target 
problem with real data will be k=6000 by n=400 with 6801 parameters.

I am currently using optim(method="BFGS") but it is slow.  I can get good 
starting values for x1 and g1 with some svd techniques, and these help me 
generate the starting values for the betas via lm().

I then use optim() on a modified likelihood function to find g1,x1,sigma2 
while holding beta fixed and then use optim() again to find beta while 
holding the other variables fixed.  But eventually, I have to run optim on 
the unmodified likelihood function above and it is very slow, taking 
several days for large problems.

I have also tried metrop() in mcmc, but I find this needs to be very 
close to the mode of the likelihood to be efficient (in fact, MCMCpack's 
metropolis function calls optim first and even requires it to invert the 
hessian before even starting the metropolis algorithm, unless we can 
provide our own covariance matrix).  I will probably use metrop() to 
generate standard errors once I find a mode....

In the mean time, I can't help thinking that there is some easy way to 
make this much faster than I am currently doing, especially since the 
likelihood is normal.  I am sure I have missed something obvious so I'd 
very much appreciate any advice you could give on packages in R or code 
that might help.

Thanks!
james

----------------------------------------------------------------------
James H. Fowler, Associate Professor   web:   http://jhfowler.ucsd.edu
Department of Political Science        email: jhfowler at ucsd.edu
University of California, San Diego    phone: (858) 534-6807
Social Sciences Building 383, #0521    fax:   (858) 534-7130
9500 Gilman Drive, La Jolla, CA 92093



More information about the R-help mailing list