[R] Quickly reading data into the Matrix packages sparse formats

Paul Bailey pdbailey at umd.edu
Tue Jun 17 03:31:23 CEST 2008

I have data set that I wish to solve with the Matrix package's sparse  
matrix functionality. The speed improvements that it has achieved are  
amazing, with my dense matrix solutions never taking really long  
enough to time in what I've been able to time so far. However, before  
I can solve my full linear model, I need to be able to read in all  
the data, and therein lies the rub. There are two ways that I see to  
read it in:

(1) generate a dense X matrix and then convert it to a sparse matrix  
using i.e.

R> require(Matrix)
R> Xsparse <- as(X,"dgCMatrix")

(2) make a new sparse X matrix and then populate it.
R> require(Matrix)
R> Xsparse <- Matrix(0,nrow=n,ncol=m,sparse=T)

then for relevant cells:
R> Xsparse[i,j] <- v

But both of these methods are painfully slow. method 1 takes many  
times as long as the actual solving and what's worse, ends up being  
only about 1/2 as time consuming as sparse solvers when all is told.  
It also requires that a dense version of X approximately fit in  
memory. method 2 is significantly slower still, taking more than a  
factor of 10 longer than the dense solver. For 2 I tried dgCMatrix  
and dgTMatrix with little difference. I've searched though the  
documentation on the Matrix package, and there is no mention of this  
problem or its potential cure.

Is there some way that I can format the data that will allow for  
rapid read in, or is there some other possible cure?

Paul Bailey

More information about the R-help mailing list