[R] very fast OLS regression?

Ravi Varadhan rvaradhan at jhmi.edu
Wed Mar 25 23:13:24 CET 2009


Yes, Bert.  Any least-squares solution that forms X'X and then inverts it is not to be recommended.  If X is nearly rank-deficient, then X'X will be more strongly so.  The QR decomposition approach in my byhand.qr() function is reliable and fast.

Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu


----- Original Message -----
From: Bert Gunter <gunter.berton at gene.com>
Date: Wednesday, March 25, 2009 6:03 pm
Subject: Re: [R] very fast OLS regression?
To: 'ivo welch' <ivowel at gmail.com>, 'Dimitris Rizopoulos' <d.rizopoulos at erasmusmc.nl>
Cc: 'r-help' <r-help at stat.math.ethz.ch>


> lm is slow because it has to set up the design matrix (X) each time. See
>  ?model.matrix and ?model.matrix.lm  for how to do this once 
> separately from
>  lm (and then use lm.fit).
>  
>  I am far from an expert on numerical algebra, but I'm pretty sure your
>  comments below are incorrect in the sense that "naive" methods are **not**
>  universally better then efficient numerical algebra methods 
> using,say, QR
>  decompositions. It will depend very strongly on the size and specific 
> nature
>  of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) 
> are,
>  after all, asymptotic. There is also typically a tradeoff between
>  computational accuracy (especially for ill-conditioned matrices)  and 
> speed
>  which your remarks seem to neglect.
>  
>  -- Bert
>  
>  
>  Bert Gunter
>  Genentech Nonclinical Biostatistics
>  650-467-7374
>  
>  -----Original Message-----
>  From: r-help-bounces at r-project.org [ On
>  Behalf Of ivo welch
>  Sent: Wednesday, March 25, 2009 2:30 PM
>  To: Dimitris Rizopoulos
>  Cc: r-help
>  Subject: Re: [R] very fast OLS regression?
>  
>  thanks, dimitris.  I also added Bill Dunlap's "solve(qr(x),y)"
>  function as ols5.   here is what I get in terms of speed on a Mac Pro:
>  
>  ols1 6.779 3.591 10.37 0 0
>  ols2 0.515 0.21 0.725 0 0
>  ols3 0.576 0.403 0.971 0 0
>  ols4 1.143 1.251 2.395 0 0
>  ols5 0.683 0.565 1.248 0 0
>  
>  so the naive matrix operations are fastest.  I would have thought that
>  alternatives to the naive stuff I learned in my linear algebra course
>  would be quicker.   still, ols3 and ols5 are competitive.  the
>  built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
>  preferable in terms of accuracy?  I think I can deal with 20% speed
>  slow-down (but not with a factor 10 speed slow-down).
>  
>  regards,
>  
>  /iaw
>  
>  
>  On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
>  <d.rizopoulos at erasmusmc.nl> wrote:
>  > check the following options:
>  >
>  > ols1 <- function (y, x) {
>  >    coef(lm(y ~ x - 1))
>  > }
>  >
>  > ols2 <- function (y, x) {
>  >    xy <- t(x)%*%y
>  >    xxi <- solve(t(x)%*%x)
>  >    b <- as.vector(xxi%*%xy)
>  >    b
>  > }
>  >
>  > ols3 <- function (y, x) {
>  >    XtX <- crossprod(x)
>  >    Xty <- crossprod(x, y)
>  >    solve(XtX, Xty)
>  > }
>  >
>  > ols4 <- function (y, x) {
>  >    lm.fit(x, y)$coefficients
>  > }
>  >
>  
>  ______________________________________________
>  R-help at r-project.org mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.
>  
>  ______________________________________________
>  R-help at r-project.org mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list