[R] Creating a sparse matrix from a file

Martin Maechler maechler at stat.math.ethz.ch
Tue Oct 27 11:42:46 CET 2009


    PP> Hi all,

    PP> I used sparseM package for creating sparse Matrix and
    PP> followed below commands.

I'd strongly recommend to use package 'Matrix'  which is part of
every R distribution (since R 2.9.0).

    PP> The sequence of commands are:

    >> ex <- read.table('fileName',sep=',')
    >> M <- as.matrix.csr(0,22638,80914)
    >> for (i in 1:nrow(ex)) { M[ex[i,1],ex[i,2]]<-ex[i,3]}

This is very slow in either 'Matrix' or 'SparseM'
as soon as  nrow(ex)  is non-small.

However, there are very efficient ways to construct the sparse
matrix directly from your 'ex' structure:
In 'Matrix' you should use  the  sparseMatrix() function as you
had proposed.

Here I provide a reproducible example,
using a random 'ex':


n <- 22638
m <- 80914
nnz <- 300000 # no idea if this is realistic for you

set.seed(101)
ex <- cbind(i = sample(n,nnz, replace=TRUE),
            j = sample(m,nnz, replace=TRUE),
            x = round(100 * rnorm(nnz)))

library(Matrix)

M <- sparseMatrix(i = ex[,"i"],
                  j = ex[,"j"],
                  x = ex[,"x"])
MM. <- tcrossprod(M) # == MM' := M %*% t(M) 

M.1 <- M %*% rep(1, ncol(M))
stopifnot(identical(drop(M.1), rowSums(M)))

## ....  and now do other stuff with your sparse matrix M

 
    PP> Even after 4 hours, I can still see the above command running. But, I am not
    PP> sure whether it got stuck some where.
 
    PP> Also, when I initialize matrix M and try to display the values, I can see
    PP> something like this
    PP> [1] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    PP> 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    PP> 2 2 2 2 2 2 2 2 2 2
    PP> [85] 2 2

    PP> And, after I stopped executing above initialize command from table(after 4
    PP> hours). I could see a different values.

    PP> Could some one kindly explain what these number are about and how can I test
    PP> that my command is running and not just stuck some where.

    PP> Also, it would be great if some one point me to a tutorial if any on sparse
    PP> matricies on R as I couldn't get one from internet.

    PP> Thanks
    PP> Pallavi



    PP> Pallavi Palleti wrote:
    >> 
    >> Hi David,
    >> 
    >> Thanks for your help. This is exactly what I want.
    >> But, I have number of rows of my matrix = 25k and columns size as 80k. So,
    >> when I define a matrix object, it is throwing an error saying can not
    >> allocate a vector of length (25K * 80k). I heard that, this data can still
    >> be loaded into R using sparseMatrix. However, I couldn't get a syntax for
    >> creating the same.  Could someone kindly help me in this regard.
    >> 
    >> Thanks
    >> Pallavi
    >> 
    >> 
    >> David Winsemius wrote:
    >>> 
    >>> 
    >>> On Oct 26, 2009, at 5:06 AM, Pallavi Palleti wrote:
    >>> 
    >>>> 
    >>>> Hi all,
    >>>> 
    >>>> I am new to R and learning the same. I would like to create a sparse  
    >>>> matrix
    >>>> from an existing file whose contents are in the format
    >>>> "rowIndex,columnIndex,value"
    >>>> 
    >>>> for ex:
    >>>> 1,2,14
    >>>> 2,4,15
    >>>> 
    >>>> I would like to create a sparse matrix by taking the above as input.
    >>>> However, I couldn't find an example where the data was being read  
    >>>> from a
    >>>> file. I tried searching in R tutorial and also searched for the same  
    >>>> in web
    >>>> but in vain. Could some one kindly help me how to give the above  
    >>>> format as
    >>>> input in R to create a sparse matrix.
    >>> 
    >>> ex <- read.table(textConnection("1,2,14
    >>> 2,4,15") , sep=",")
    >>> ex
    >>> #  V1 V2 V3
    >>> #1  1  2 14
    >>> #2  2  4 15
    >>> 
    >>> M <- Matrix(0, 20, 20)
    >>> 
    >>> > M
    >>> #20 x 20 sparse Matrix of class "dsCMatrix"
    >>> 
    >>> [1,] . . . . . . . . . . . . . . . . . . . .
    >>> [2,] . . . . . . . . . . . . . . . . . . . .
    >>> [3,] . . . . . . . . . . . . . . . . . . . .
    >>> snip
    >>> 
    >>> for (i in 1:nrow(ex) ) { M[ex[i, 1], ex[i, 2] ] <- ex[i, 3] }
    >>> 
    >>> > M
    >>> 20 x 20 sparse Matrix of class "dgCMatrix"
    >>> 
    >>> [1,] . 14 .  . . . . . . . . . . . . . . . . .
    >>> [2,] .  . . 15 . . . . . . . . . . . . . . . .
    >>> [3,] .  . .  . . . . . . . . . . . . . . . . .
    >>> snip
    >>> >
    >>> --
    >>> 
    >>> David Winsemius, MD
    >>> Heritage Laboratories
    >>> West Hartford, CT
    >>> 
    >>> ______________________________________________
    >>> R-help at r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>> PLEASE do read the posting guide
    >>> http://www.R-project.org/posting-guide.html
    >>> and provide commented, minimal, self-contained, reproducible code.
    >>> 
    >>> 
    >> 
    >> 

    PP> -- 
    PP> View this message in context: http://www.nabble.com/Creating-a-sparse-matrix-from-a-file-tp26056334p26075036.html
    PP> Sent from the R help mailing list archive at Nabble.com.

    PP> ______________________________________________
    PP> R-help at r-project.org mailing list
    PP> https://stat.ethz.ch/mailman/listinfo/r-help
    PP> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    PP> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list