[R] R versus SAS: lm performance

Douglas Bates bates at stat.wisc.edu
Tue May 11 14:07:14 CEST 2004


<Arne.Muller at aventis.com> writes:

> Hello,
> 
> A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. He got the same results, but SAS took a bit more than a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of memory, and I assume that all machines have similar hardware, but the S+ and SAS machines are on windows whereas the R machine is Redhat Linux 7.2.
> 
> My question is if I'm doing something wrong (technically) calling the lm routine, or (if not), how I can optimize the call to lm or even using an alternative to lm. I'd like to run about 12,000 of these models in R (for a gene expression experiment - one model per gene, which would take far too long).
> 
> I've run the follwong code in R (and S+):

...

As Brian Ripley mentioned, you could save the model matrix and use it
with each of your responses.  Versions 0.8-1 and later of the Matrix
package have a vignette that provides comparative timings of various
ways of obtaining the least squares estimates.  If you use the classes
from the Matrix package and create and save the crossproduct of the
model matrix

mm = as(model.matrix(Va ~ Ba+Ti..., df), "geMatrix")
cprod = crossprod(mm)

then successive calls to

coef = solve(cprod, crossprod(mm, df$Va))

will produce the coefficient estimates much faster than will calls to
lm, which each do all the work of generating and decomposing the very
large model matrix.

Note that this method only produces the coefficient estimates, which
may be enough for your purposes.  Also, this method will not handle
missing data or rank-deficient model matrices in the elegant way that
lm does.

If you are doing this 12,000 times it may be worthwhile checking if
the sparse matrix formulation

mmS = as(mm, "cscMatrix")
cprodS = crossprod(mmS)

is faster.

The dense matrix formulation (but not the sparse) can benefit from
installation of optimized BLAS routines such as Atlas or Goto's BLAS.

-- 
Douglas Bates                            bates at stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/




More information about the R-help mailing list