[R] very fast OLS regression?

Bert Gunter gunter.berton at gene.com
Wed Mar 25 22:59:57 CET 2009


lm is slow because it has to set up the design matrix (X) each time. See
?model.matrix and ?model.matrix.lm  for how to do this once separately from
lm (and then use lm.fit).

I am far from an expert on numerical algebra, but I'm pretty sure your
comments below are incorrect in the sense that "naive" methods are **not**
universally better then efficient numerical algebra methods using,say, QR
decompositions. It will depend very strongly on the size and specific nature
of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) are,
after all, asymptotic. There is also typically a tradeoff between
computational accuracy (especially for ill-conditioned matrices)  and speed
which your remarks seem to neglect.

-- Bert


Bert Gunter
Genentech Nonclinical Biostatistics
650-467-7374

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of ivo welch
Sent: Wednesday, March 25, 2009 2:30 PM
To: Dimitris Rizopoulos
Cc: r-help
Subject: Re: [R] very fast OLS regression?

thanks, dimitris.  I also added Bill Dunlap's "solve(qr(x),y)"
function as ols5.   here is what I get in terms of speed on a Mac Pro:

ols1 6.779 3.591 10.37 0 0
ols2 0.515 0.21 0.725 0 0
ols3 0.576 0.403 0.971 0 0
ols4 1.143 1.251 2.395 0 0
ols5 0.683 0.565 1.248 0 0

so the naive matrix operations are fastest.  I would have thought that
alternatives to the naive stuff I learned in my linear algebra course
would be quicker.   still, ols3 and ols5 are competitive.  the
built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
preferable in terms of accuracy?  I think I can deal with 20% speed
slow-down (but not with a factor 10 speed slow-down).

regards,

/iaw


On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
<d.rizopoulos at erasmusmc.nl> wrote:
> check the following options:
>
> ols1 <- function (y, x) {
>    coef(lm(y ~ x - 1))
> }
>
> ols2 <- function (y, x) {
>    xy <- t(x)%*%y
>    xxi <- solve(t(x)%*%x)
>    b <- as.vector(xxi%*%xy)
>    b
> }
>
> ols3 <- function (y, x) {
>    XtX <- crossprod(x)
>    Xty <- crossprod(x, y)
>    solve(XtX, Xty)
> }
>
> ols4 <- function (y, x) {
>    lm.fit(x, y)$coefficients
> }
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list