[R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices
spencer.graves at prodsyse.com
Fri May 15 18:29:47 CEST 2009
For problems with many parameters to estimate, I highly recommend
Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus
(Springer). This book includes numerous examples showing how to use the
"nlme" package. The value of this book is greatly enhanced by the
availability of script files named, "ch01.R", "ch02.R", ... "ch08.R"
showing how to work virtually all the examples in the book. These
script files are available in your local installation of R. To find
them, enter the following at a commands prompt in R:
Hope this helps.
Dear Doug, et al.:
What would you recommend for analyzing a longitudinal abundance
survey of 22 species, when the species were not selected at random? A
prominent scientist tried to tell me that mixed-effects modeling is
inappropriate in that case because the species were selected
purposefully not at random.
My response is that even in that case, one should still use
mixed-effects modeling, because it will tend to produce more appropriate
estimates for the deviations of individual species from the average of
all species -- potentially much lower variance with slight bias -- than
naive ordinary least squares. The estimated variance components will
not represent the between-species variance for the actual population of
all hypothetical species of the particular type, but will represent the
between-species variability in a hypothetical population from which the
selected species might be considered a random sample.
p.s. I appreciate very much Doug's comment on this. I thought about
adding something like that to my reply but didn't feel I could afford
the time then.
Douglas Bates wrote:
> On Wed, May 13, 2009 at 5:21 PM, <Avraham.Adler at guycarp.com> wrote:
>> I am trying to optimize a set of parameters using /optim/ in which the
>> actual function to be minimized contains matrix multiplication and is of
>> the form:
>> SUM ((A%*%X - B)^2)
>> where A is a matrix and X and B are vectors, with X as parameter vector.
> As Spencer Graves pointed out, what you are describing here is a
> linear least squares problem, which has a direct (i.e. non-iterative)
> solution. A comparison of the speed of various ways of solving such a
> system is given in one of the vignettes in the Matrix package.
>> This has worked well so far. Recently, I was given a data set A of size
>> 360440 x 1173, which could not be handled as a normal matrix. I brought it
>> into 'R' as a sparse matrix (dgCMatrix - using sparseMatrix from the Matrix
>> package), and the formulæ and gradient work, but /optim/ returns an error
>> of the form "no method for coercing this S4 class to a vector".
> If you just want the least squares solution X then
> X <- solve(crossprod(A), crossprod(A, B))
> will likely be the fastest method where A is the sparse matrix.
> I do feel obligated to point out that the least squares solution for
> such large systems is rarely a sensible solution to the underlying
> problem. If you have over 1000 columns in A and it is very sparse
> then likely at least parts of A are based on indicator columns for a
> categorical variable. In such situations a model with random effects
> for the category is often preferable to the fixed-effects model you
> are fitting.
>> After briefly looking into methods and classes, I realize I am in way over
>> my head. Is there any way I could use /optim/ or another optimization
>> algorithm, on sparse matrices?
>> Thank you very much,
>> --Avraham Adler
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help