[R] Change in 'solve' for r-patched

Williams, Elliot - BLS Williams.Elliot at bls.gov
Fri Oct 31 20:50:10 CET 2003


Hi all,

Thanks Douglas for bringing up numerical stability and OLS regression
coefficients.  I'm often worried about the speed of regression runs, as I
sometimes run large simulations or resamplings.  Inspired, and prone to run
many regressions, I did a little simulation as follows.  XP results are on a
2.4Ghz Pentium 4 in Windows from the binaries (without BLAS?) and Linux
results are on a 1.5Ghz Pentium M laptop with a Pentium Atlas BLAS.

Setting up the data: x is rnormals with a vector of ones prepended... 
> x <- cbind(rep(1,100), matrix(rnorm(1000), nc=10))
> beta0 <- runif(11)
> y <- x %*% beta0 + rnorm(100)

Estimating Slope Coefficients:

> system.time(for (i in 1:1000) lm( y~ x -1))
XP:  [1] 5.91 0.00 5.90   NA   NA
Linux: [1] 5.27 0.01 5.28 0.00 0.00

> system.time(for (i in 1:1000) solve(t(x) %*% x) %*% t(x) %*% y)
XP: [1] 0.64 0.01 0.65   NA   NA
Linux: [1] 0.51 0.01 0.57 0.00 0.00

> system.time(for (i in 1:1000) qr.coef(qr(x), y) )
XP: [1] 0.75 0.00 0.75   NA   NA
Linux: [1] 0.76 0.00 0.77 0.00 0.00

> system.time(for (i in 1:1000) solve(crossprod(x), crossprod(x,y)))
XP: [1] 0.45 0.00 0.53   NA   NA
Linux: [1] 0.35 0.00 0.36 0.00 0.00

On both platforms, BLAS or not, the solve(crossprod()) method works the
fastest, with the naïve solve(t(x)%*%x) second-fastest.  

Calculating (t(x)%*%x)^-1:

> system.time(for (i in 1:1000) chol2inv(qr.R(qr(x))))
XP: [1] 0.44 0.00 0.44   NA   NA
Linux: [1] 0.40 0.00 0.40 0.00 0.00

> system.time(for (i in 1:1000) solve(crossprod(x)) )
XP: [1] 0.58 0.01 0.59   NA   NA
Linux: [1] 0.34 0.00 0.34 0.00 0.00

Where the chol2inv() method was faster in Windows but slower in Linux, which
I'm guessing is due to differential uses of BLAS functions.

None of these address the problem of numerical accuracy, which was the
thrust of Doug's comments.  Has anyone done a quick simulation to
investigate the stability of the solutions?  Is it small coefficients or
near-collinearity (or both) that one has to worry about?  

Are the solve(crossprod()) methods obviously unstable?  They surely do work
quickly!

Thanks again R-universe.  I've needed no other statistical software for the
last 2 years.  I hope to get around to contributing a package or two soon.

						Elliot.




More information about the R-help mailing list