Even more timings [was Re: [R] Change in 'solve' for r-patched]

Douglas Bates bates at stat.wisc.edu
Sun Nov 2 19:00:35 CET 2003


Earlier I posted some timings on different ways to solve a least
squares problem in R.  To get a comparison of the raw speed of the
Cholesky and the QR methods I wrote a couple of C functions (shown
below) that call Lapack and BLAS routines directly.

The results from these are
> system.time(blsq <- .Call("lsq_dense_Chol", mmd, y))
[1] 0.46 0.00 0.46 0.00 0.00
> system.time(blsq <- .Call("lsq_dense_Chol", mmd, y))
[1] 0.44 0.01 0.45 0.00 0.00
> all.equal(blsq, bsparse)
[1] TRUE
> system.time(blsQR <- .Call("lsq_dense_QR", mmd, y)[1:ncol(mmd),,drop=FALSE])
[1] 0.94 0.01 0.96 0.00 0.00
> all.equal(blsQR, bsparse)
[1] TRUE
> system.time(blsQR <- .Call("lsq_dense_QR", mmd, y)[1:ncol(mmd),,drop=FALSE])
[1] 0.94 0.03 0.97 0.00 0.00

For the dense matrix methods the timings, in increasing order, are

> system.time(blsq <- .Call("lsq_dense_Chol", mmd, y))
[1] 0.44 0.01 0.45 0.00 0.00
> system.time(bc2i <- chol2inv(chol(crossprod(mmd))) %*% crossprod(mmd, y))
[1] 0.60 0.02 0.63 0.00 0.00
> system.time(bLU <- solve(crossprod(mmd), crossprod(mmd, y)))
[1] 0.64 0.06 0.71 0.00 0.00
> system.time(blsQR <- .Call("lsq_dense_QR", mmd, y)[1:ncol(mmd),,drop=FALSE])
[1] 0.94 0.01 0.96 0.00 0.00
> system.time(bLA <- qr.coef(qr(mmd, LAPACK = TRUE), y))
[1] 3.57 0.04 3.61 0.00 0.00
> system.time(bQR <- qr.coef(qr(mmd), y))
[1] 3.70 0.11 3.81 0.00 0.00

Although only the last two can handle a rank deficient model matrix I
think there is a fairly easy way of modifying the code shown below to
detect apparent singularity and accomodate for it.

I think that the reason that the Linpack-based QR method is slow is
because it is based on Linpack, not Lapack, and the reason the qr(mmd,
LAPACK=TRUE) is slow is because it does full column pivoting.

It appears that QR is a little more than twice as slow as the Cholesky
and about 1.5 times as slow as the naive calculation using an LU
decomposition. 

The team working on FLAME (http://www.cs.utexas.edu/users/flame) has
test versions of blocked algorithms for the QR and the Cholesky
decompositions that are faster than the Lapack code.  I expect to be
testing those in the next few weeks.

Here is the C code called above.  

#include "Rdefines.h"
#include "R_ext/Lapack.h"

SEXP lsq_dense_Chol(SEXP X, SEXP y)
{
    SEXP ans;
    int info, n, p, k, *Xdims, *ydims;
    double *xpx, d_one = 1., d_zero = 0.;

    if (!(isReal(X) & isMatrix(X)))
	error("X must be a numeric (double precision) matrix");
    Xdims = INTEGER(coerceVector(getAttrib(X, R_DimSymbol), INTSXP));
    n = Xdims[0];
    p = Xdims[1];
    if (!(isReal(y) & isMatrix(y)))
	error("y must be a numeric (double precision) matrix");
    ydims = INTEGER(coerceVector(getAttrib(y, R_DimSymbol), INTSXP));
    if (ydims[0] != n)
	error(
	    "number of rows in y (%d) does not match number of rows in X (%d)",
	    ydims[0], n);
    k = ydims[1];
    if (k < 1 || p < 1) return allocMatrix(REALSXP, p, k);
    ans = PROTECT(allocMatrix(REALSXP, p, k));
    F77_CALL(dgemm)("T", "N", &p, &k, &n, &d_one, REAL(X), &n, REAL(y), &n,
		    &d_zero, REAL(ans), &p);
    xpx = Calloc(p * p, double);
    F77_CALL(dsyrk)("U", "T", &p, &n, &d_one, REAL(X), &n, &d_zero,
		    xpx, &p);
    F77_CALL(dposv)("U", &p, &k, xpx, &p, REAL(ans), &p, &info);
    if (info) error("Lapack routine dposv returned error code %d", info);
    Free(xpx); 
    UNPROTECT(1);
    return ans;
}
 
    
SEXP lsq_dense_QR(SEXP X, SEXP y)
{
    SEXP ans;
    int info, n, p, k, *Xdims, *ydims, lwork;
    double *work, d_one = 1., d_zero = 0., tmp, *xvals;

    if (!(isReal(X) & isMatrix(X)))
	error("X must be a numeric (double precision) matrix");
    Xdims = INTEGER(coerceVector(getAttrib(X, R_DimSymbol), INTSXP));
    n = Xdims[0];
    p = Xdims[1];
    if (!(isReal(y) & isMatrix(y)))
	error("y must be a numeric (double precision) matrix");
    ydims = INTEGER(coerceVector(getAttrib(y, R_DimSymbol), INTSXP));
    if (ydims[0] != n)
	error(
	    "number of rows in y (%d) does not match number of rows in X (%d)",
	    ydims[0], n);
    k = ydims[1];
    if (k < 1 || p < 1) return allocMatrix(REALSXP, p, k);
    xvals = Calloc(n * p, double);
    Memcpy(xvals, REAL(X), n * p);
    ans = PROTECT(duplicate(y));
    lwork = -1;
    F77_CALL(dgels)("N", &n, &p, &k, xvals, &n, REAL(ans), &n,
		    &tmp, &lwork, &info);
    if (info)
	error("First call to Lapack routine dgels returned error code %d",
	      info);
    lwork = (int) tmp;
    work = Calloc(lwork, double);
    F77_CALL(dgels)("N", &n, &p, &k, xvals, &n, REAL(ans), &n,
		    work, &lwork, &info);
    if (info)
	error("Second call to Lapack routine dgels returned error code %d",
	      info);
    Free(work); Free(xvals);
    UNPROTECT(1);
    return ans;
}




More information about the R-help mailing list