[R] very fast OLS regression?

Douglas Bates bates at stat.wisc.edu
Sat Mar 28 15:07:02 CET 2009


On Wed, Mar 25, 2009 at 5:15 PM, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
> On Wed, 2009-03-25 at 16:28 -0400, ivo welch wrote:
>> Dear R experts:
>>
>> I just tried some simple test that told me that hand computing the OLS
>> coefficients is about 3-10 times as fast as using the built-in lm()
>> function.   (code included below.)  Most of the time, I do not care,
>> because I like the convenience, and I presume some of the time goes
>> into saving a lot of stuff that I may or may not need.  But when I do
>> want to learn the properties of an estimator whose input contains a
>> regression, I do care about speed.
>>
>> What is the recommended fastest way to get regression coefficients in
>> R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
>> low-level C form somewhere?  that one was always lightning fast for
>> me.)
>
> No one has yet mentioned Doug Bates' article in R News on this topic,
> which compares timings of various methods for least squares
> computations.
>
> Douglas Bates. Least squares calculations in R. R News, 4(1):17-20, June
> 2004.
>
> A Cholesky decomposition solution proved fastest in base R code, with an
> even faster version developed using sparse matrices and the Matrix
> package.
>
> You can find Doug's article here:
>
> http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf

An updated version is available as the "Comparisons" vignette in the
Matrix package.

>>
>> regards,
>>
>> /ivo
>>
>>
>>
>> bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));
>>
>> byhand = function( y, x ) {
>>   xy<-t(x)%*%y;
>>   xxi<- solve(t(x)%*%x)
>>   b<-as.vector(xxi%*%xy)
>>   ## I will need these later, too:
>>   ## res<-y-as.vector(x%*%b)
>>   ## soa[i]<-b[2]
>>   ## sigmas[i]<-sd(res)
>>   b;
>> }
>>
>>
>> MC=500;
>> N=10000;
>>
>>
>> set.seed(0);
>> x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
>> y= matrix( rnorm(N*MC), nrow=N, ncol=MC );
>>
>> ptm = proc.time()
>> for (mc in 1:MC) byhand(y[,mc],x[,mc]);
>> cat("By hand took ", proc.time()-ptm, "\n");
>>
>> ptm = proc.time()
>> for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
>> cat("By built-in took ", proc.time()-ptm, "\n");
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list