[R] Creating a sparse matrix from a file

Martin Maechler maechler at stat.math.ethz.ch
Tue Oct 27 16:04:22 CET 2009


>>>>> "PP" == Pallavi P <pallavip.05 at gmail.com>
>>>>>     on Tue, 27 Oct 2009 18:13:22 +0530 writes:

    PP> Hi Martin,
    PP> Thanks for the help. Just to make sure I understand correctly.

    PP> The below steps are for creating an example table similar to the one that I
    PP> read from file.

yes, exactly

     n <- 22638
     m <- 80914
     nnz <- 300000 # no idea if this is realistic for you

     set.seed(101)
     ex <- cbind(i = sample(n,nnz, replace=TRUE),
     j = sample(m,nnz, replace=TRUE),
     x = round(100 * rnorm(nnz)))


    PP> and I can understand the way sparseMatrix is initialized right now as
    M <- sparseMatrix(i = ex[,"i"],
                      j = ex[,"j"],
    		      x = ex[,"x"])

    PP> How ever, I couldn't understand the use of below commands.

   MM. <- tcrossprod(M) # == MM' := M %*% t(M)
   M.1 <- M %*% rep(1, ncol(M))
   stopifnot(identical(drop(M.1), rowSums(M)))

They were just for illustrative purposes,
to show how and that you can work with the created sparse matrix
'M'.

Regards,
Martin Maechler, ETH Zurich

    PP> Kindly let me know if I missed something.

    PP> Thanks
    PP> Pallavi

    PP> On Tue, Oct 27, 2009 at 4:12 PM, Martin Maechler <maechler at stat.math.ethz.ch
    >> wrote:

    >> 
    PP> Hi all,
    >> 
    PP> I used sparseM package for creating sparse Matrix and
    PP> followed below commands.
    >> 
    >> I'd strongly recommend to use package 'Matrix'  which is part of
    >> every R distribution (since R 2.9.0).
    >> 
    PP> The sequence of commands are:
    >> 
    >> >> ex <- read.table('fileName',sep=',')
    >> >> M <- as.matrix.csr(0,22638,80914)
    >> >> for (i in 1:nrow(ex)) { M[ex[i,1],ex[i,2]]<-ex[i,3]}
    >> 
    >> This is very slow in either 'Matrix' or 'SparseM'
    >> as soon as  nrow(ex)  is non-small.
    >> 
    >> However, there are very efficient ways to construct the sparse
    >> matrix directly from your 'ex' structure:
    >> In 'Matrix' you should use  the  sparseMatrix() function as you
    >> had proposed.
    >> 
    >> Here I provide a reproducible example,
    >> using a random 'ex':
    >> 
    >> 
    >> n <- 22638
    >> m <- 80914
    >> nnz <- 300000 # no idea if this is realistic for you
    >> 
    >> set.seed(101)
    >> ex <- cbind(i = sample(n,nnz, replace=TRUE),
    >> j = sample(m,nnz, replace=TRUE),
    >> x = round(100 * rnorm(nnz)))
    >> 
    >> library(Matrix)
    >> 
    >> M <- sparseMatrix(i = ex[,"i"],
    >> j = ex[,"j"],
    >> x = ex[,"x"])
    >> MM. <- tcrossprod(M) # == MM' := M %*% t(M)
    >> 
    >> M.1 <- M %*% rep(1, ncol(M))
    >> stopifnot(identical(drop(M.1), rowSums(M)))
    >> 
    >> ## ....  and now do other stuff with your sparse matrix M
    >> 
    >> 
    PP> Even after 4 hours, I can still see the above command running. But,
    >> I am not
    PP> sure whether it got stuck some where.
    >> 
    PP> Also, when I initialize matrix M and try to display the values, I
    >> can see
    PP> something like this
    PP> [1] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    >> 2 2 2 2
    PP> 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    >> 2 2 2 2
    PP> 2 2 2 2 2 2 2 2 2 2
    PP> [85] 2 2
    >> 
    PP> And, after I stopped executing above initialize command from
    >> table(after 4
    PP> hours). I could see a different values.
    >> 
    PP> Could some one kindly explain what these number are about and how
    >> can I test
    PP> that my command is running and not just stuck some where.
    >> 
    PP> Also, it would be great if some one point me to a tutorial if any on
    >> sparse
    PP> matricies on R as I couldn't get one from internet.
    >> 
    PP> Thanks
    PP> Pallavi
    >> 
    >> 
    >> 
    PP> Pallavi Palleti wrote:
    >> >>
    >> >> Hi David,
    >> >>
    >> >> Thanks for your help. This is exactly what I want.
    >> >> But, I have number of rows of my matrix = 25k and columns size as
    >> 80k. So,
    >> >> when I define a matrix object, it is throwing an error saying can not
    >> >> allocate a vector of length (25K * 80k). I heard that, this data can
    >> still
    >> >> be loaded into R using sparseMatrix. However, I couldn't get a syntax
    >> for
    >> >> creating the same.  Could someone kindly help me in this regard.
    >> >>
    >> >> Thanks
    >> >> Pallavi
    >> >>
    >> >>
    >> >> David Winsemius wrote:
    >> >>>
    >> >>>
    >> >>> On Oct 26, 2009, at 5:06 AM, Pallavi Palleti wrote:
    >> >>>
    >> >>>>
    >> >>>> Hi all,
    >> >>>>
    >> >>>> I am new to R and learning the same. I would like to create a
    >> sparse
    >> >>>> matrix
    >> >>>> from an existing file whose contents are in the format
    >> >>>> "rowIndex,columnIndex,value"
    >> >>>>
    >> >>>> for ex:
    >> >>>> 1,2,14
    >> >>>> 2,4,15
    >> >>>>
    >> >>>> I would like to create a sparse matrix by taking the above as
    >> input.
    >> >>>> However, I couldn't find an example where the data was being read
    >> >>>> from a
    >> >>>> file. I tried searching in R tutorial and also searched for the
    >> same
    >> >>>> in web
    >> >>>> but in vain. Could some one kindly help me how to give the above
    >> >>>> format as
    >> >>>> input in R to create a sparse matrix.
    >> >>>
    >> >>> ex <- read.table(textConnection("1,2,14
    >> >>> 2,4,15") , sep=",")
    >> >>> ex
    >> >>> #  V1 V2 V3
    >> >>> #1  1  2 14
    >> >>> #2  2  4 15
    >> >>>
    >> >>> M <- Matrix(0, 20, 20)
    >> >>>
    >> >>> > M
    >> >>> #20 x 20 sparse Matrix of class "dsCMatrix"
    >> >>>
    >> >>> [1,] . . . . . . . . . . . . . . . . . . . .
    >> >>> [2,] . . . . . . . . . . . . . . . . . . . .
    >> >>> [3,] . . . . . . . . . . . . . . . . . . . .
    >> >>> snip
    >> >>>
    >> >>> for (i in 1:nrow(ex) ) { M[ex[i, 1], ex[i, 2] ] <- ex[i, 3] }
    >> >>>
    >> >>> > M
    >> >>> 20 x 20 sparse Matrix of class "dgCMatrix"
    >> >>>
    >> >>> [1,] . 14 .  . . . . . . . . . . . . . . . . .
    >> >>> [2,] .  . . 15 . . . . . . . . . . . . . . . .
    >> >>> [3,] .  . .  . . . . . . . . . . . . . . . . .
    >> >>> snip
    >> >>> >
    >> >>> --
    >> >>>
    >> >>> David Winsemius, MD
    >> >>> Heritage Laboratories
    >> >>> West Hartford, CT
    >> >>>
    >> >>> ______________________________________________
    >> >>> R-help at r-project.org mailing list
    >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
    >> >>> PLEASE do read the posting guide
    >> >>> http://www.R-project.org/posting-guide.html
    >> >>> and provide commented, minimal, self-contained, reproducible code.
    >> >>>
    >> >>>
    >> >>
    >> >>
    >> 
    PP> --
    PP> View this message in context:
    >> http://www.nabble.com/Creating-a-sparse-matrix-from-a-file-tp26056334p26075036.html
    PP> Sent from the R help mailing list archive at Nabble.com.
    >> 
    PP> ______________________________________________
    PP> R-help at r-project.org mailing list
    PP> https://stat.ethz.ch/mailman/listinfo/r-help
    PP> PLEASE do read the posting guide
    >> http://www.R-project.org/posting-guide.html
    PP> and provide commented, minimal, self-contained, reproducible code.
    >> 
    PP> Hi Martin,<br><br>Thanks for the help. Just to make sure I understand correctly.<br><br>The below steps are for creating an example table similar to the one that I read from file.<br><br>n <- 22638<br>
    PP> m <- 80914<br>
    PP> nnz <- 300000 # no idea if this is realistic for you<br>
    PP> <br>
    PP> set.seed(101)<br>
    PP> ex <- cbind(i = sample(n,nnz, replace=TRUE),<br>
    PP>            j = sample(m,nnz, replace=TRUE),<br>
    PP>            x = round(100 * rnorm(nnz)))<br>
    PP> <br><br>and I can understand the way sparseMatrix is initialized right now as<br>M <- sparseMatrix(i = ex[,"i"],<br>
    PP>                  j = ex[,"j"],<br>
    PP>                  x = ex[,"x"])<br><br>How ever, I couldn't understand the use of below commands. <br>
    PP> MM. <- tcrossprod(M) # == MM' := M %*% t(M)<br>
    PP> <br>
    PP> M.1 <- M %*% rep(1, ncol(M))<br>
    PP> stopifnot(identical(drop(M.1), rowSums(M)))<br>
    PP> <br>Kindly let me know if I missed something.<br><br>Thanks<br>Pallavi<br><br><div class="gmail_quote">On Tue, Oct 27, 2009 at 4:12 PM, Martin Maechler <span dir="ltr"><<a href="mailto:maechler at stat.math.ethz.ch" target="_blank">maechler at stat.math.ethz.ch</a>></span> wrote:<br>
    PP> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
    PP> <div><br>
    PP>    PP> Hi all,<br>
    PP> <br>
    PP>    PP> I used sparseM package for creating sparse Matrix and<br>
    PP>    PP> followed below commands.<br>
    PP> <br>
    PP> </div>I'd strongly recommend to use package 'Matrix'  which is part of<br>
    PP> every R distribution (since R 2.9.0).<br>
    PP> <br>
    PP>    PP> The sequence of commands are:<br>
    PP> <br>
    PP>    >> ex <- read.table('fileName',sep=',')<br>
    PP>    >> M <- as.matrix.csr(0,22638,80914)<br>
    PP> <div>    >> for (i in 1:nrow(ex)) { M[ex[i,1],ex[i,2]]<-ex[i,3]}<br>
    PP> <br>
    PP> </div>This is very slow in either 'Matrix' or 'SparseM'<br>
    PP> as soon as  nrow(ex)  is non-small.<br>
    PP> <br>
    PP> However, there are very efficient ways to construct the sparse<br>
    PP> matrix directly from your 'ex' structure:<br>
    PP> In 'Matrix' you should use  the  sparseMatrix() function as you<br>
    PP> had proposed.<br>
    PP> <br>
    PP> Here I provide a reproducible example,<br>
    PP> using a random 'ex':<br>
    PP> <br>
    PP> <br>
    PP> n <- 22638<br>
    PP> m <- 80914<br>
    PP> nnz <- 300000 # no idea if this is realistic for you<br>
    PP> <br>
    PP> set.seed(101)<br>
    PP> ex <- cbind(i = sample(n,nnz, replace=TRUE),<br>
    PP>            j = sample(m,nnz, replace=TRUE),<br>
    PP>            x = round(100 * rnorm(nnz)))<br>
    PP> <br>
    PP> library(Matrix)<br>
    PP> <br>
    PP> M <- sparseMatrix(i = ex[,"i"],<br>
    PP>                  j = ex[,"j"],<br>
    PP>                  x = ex[,"x"])<br>
    PP> MM. <- tcrossprod(M) # == MM' := M %*% t(M)<br>
    PP> <br>
    PP> M.1 <- M %*% rep(1, ncol(M))<br>
    PP> stopifnot(identical(drop(M.1), rowSums(M)))<br>
    PP> <br>
    PP> ## ....  and now do other stuff with your sparse matrix M<br>
    PP> <br>
    PP> <br>
    PP>    PP> Even after 4 hours, I can still see the above command running. But, I am not<br>
    PP>    PP> sure whether it got stuck some where.<br>
    PP> <br>
    PP>    PP> Also, when I initialize matrix M and try to display the values, I can see<br>
    PP>    PP> something like this<br>
    PP>    PP> [1] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2<br>
    PP>    PP> 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2<br>
    PP>    PP> 2 2 2 2 2 2 2 2 2 2<br>
    PP>    PP> [85] 2 2<br>
    PP> <br>
    PP>    PP> And, after I stopped executing above initialize command from table(after 4<br>
    PP>    PP> hours). I could see a different values.<br>
    PP> <br>
    PP>    PP> Could some one kindly explain what these number are about and how can I test<br>
    PP>    PP> that my command is running and not just stuck some where.<br>
    PP> <br>
    PP>    PP> Also, it would be great if some one point me to a tutorial if any on sparse<br>
    PP>    PP> matricies on R as I couldn't get one from internet.<br>
    PP> <br>
    PP>    PP> Thanks<br>
    PP>    PP> Pallavi<br>
    PP> <br>
    PP> <br>
    PP> <br>
    PP>    PP> Pallavi Palleti wrote:<br>
    PP>    >><br>
    PP>    >> Hi David,<br>
    PP>    >><br>
    PP>    >> Thanks for your help. This is exactly what I want.<br>
    PP>    >> But, I have number of rows of my matrix = 25k and columns size as 80k. So,<br>
    PP>    >> when I define a matrix object, it is throwing an error saying can not<br>
    PP>    >> allocate a vector of length (25K * 80k). I heard that, this data can still<br>
    PP>    >> be loaded into R using sparseMatrix. However, I couldn't get a syntax for<br>
    PP>    >> creating the same.  Could someone kindly help me in this regard.<br>
    PP>    >><br>
    PP>    >> Thanks<br>
    PP>    >> Pallavi<br>
    PP> <div><div></div><div>    >><br>
    PP>    >><br>
    PP>    >> David Winsemius wrote:<br>
    PP>    >>><br>
    PP>    >>><br>
    PP>    >>> On Oct 26, 2009, at 5:06 AM, Pallavi Palleti wrote:<br>
    PP>    >>><br>
    PP>    >>>><br>
    PP>    >>>> Hi all,<br>
    PP>    >>>><br>
    PP>    >>>> I am new to R and learning the same. I would like to create a sparse<br>
    PP>    >>>> matrix<br>
    PP>    >>>> from an existing file whose contents are in the format<br>
    PP>    >>>> "rowIndex,columnIndex,value"<br>
    PP>    >>>><br>
    PP>    >>>> for ex:<br>
    PP>    >>>> 1,2,14<br>
    PP>    >>>> 2,4,15<br>
    PP>    >>>><br>
    PP>    >>>> I would like to create a sparse matrix by taking the above as input.<br>
    PP>    >>>> However, I couldn't find an example where the data was being read<br>
    PP>    >>>> from a<br>
    PP>    >>>> file. I tried searching in R tutorial and also searched for the same<br>
    PP>    >>>> in web<br>
    PP>    >>>> but in vain. Could some one kindly help me how to give the above<br>
    PP>    >>>> format as<br>
    PP>    >>>> input in R to create a sparse matrix.<br>
    PP>    >>><br>
    PP>    >>> ex <- read.table(textConnection("1,2,14<br>
    PP>    >>> 2,4,15") , sep=",")<br>
    PP>    >>> ex<br>
    PP>    >>> #  V1 V2 V3<br>
    PP>    >>> #1  1  2 14<br>
    PP>    >>> #2  2  4 15<br>
    PP>    >>><br>
    PP>    >>> M <- Matrix(0, 20, 20)<br>
    PP>    >>><br>
    PP>    >>> > M<br>
    PP>    >>> #20 x 20 sparse Matrix of class "dsCMatrix"<br>
    PP>    >>><br>
    PP>    >>> [1,] . . . . . . . . . . . . . . . . . . . .<br>
    PP>    >>> [2,] . . . . . . . . . . . . . . . . . . . .<br>
    PP>    >>> [3,] . . . . . . . . . . . . . . . . . . . .<br>
    PP>    >>> snip<br>
    PP>    >>><br>
    PP>    >>> for (i in 1:nrow(ex) ) { M[ex[i, 1], ex[i, 2] ] <- ex[i, 3] }<br>
    PP>    >>><br>
    PP>    >>> > M<br>
    PP>    >>> 20 x 20 sparse Matrix of class "dgCMatrix"<br>
    PP>    >>><br>
    PP>    >>> [1,] . 14 .  . . . . . . . . . . . . . . . . .<br>
    PP>    >>> [2,] .  . . 15 . . . . . . . . . . . . . . . .<br>
    PP>    >>> [3,] .  . .  . . . . . . . . . . . . . . . . .<br>
    PP>    >>> snip<br>
    PP>    >>> ><br>
    PP>    >>> --<br>
    PP>    >>><br>
    PP>    >>> David Winsemius, MD<br>
    PP>    >>> Heritage Laboratories<br>
    PP>    >>> West Hartford, CT<br>
    PP>    >>><br>
    PP> </div></div>    >>> ______________________________________________<br>
    PP>    >>> <a href="mailto:R-help at r-project.org" target="_blank">R-help at r-project.org</a> mailing list<br>
    PP>    >>> <a href="https://stat.ethz.ch/mailman/listinfo/r-help" target="_blank">https://stat.ethz.ch/mailman/listinfo/r-help</a><br>
    PP>    >>> PLEASE do read the posting guide<br>
    PP>    >>> <a href="http://www.R-project.org/posting-guide.html" target="_blank">http://www.R-project.org/posting-guide.html</a><br>
    PP>    >>> and provide commented, minimal, self-contained, reproducible code.<br>
    PP>    >>><br>
    PP>    >>><br>
    PP>    >><br>
    PP>    >><br>
    PP> <br>
    PP>    PP> --<br>
    PP>    PP> View this message in context: <a href="http://www.nabble.com/Creating-a-sparse-matrix-from-a-file-tp26056334p26075036.html" target="_blank">http://www.nabble.com/Creating-a-sparse-matrix-from-a-file-tp26056334p26075036.html</a><br>


    PP>    PP> Sent from the R help mailing list archive at Nabble.com.<br>
    PP> <br>
    PP>    PP> ______________________________________________<br>
    PP>    PP> <a href="mailto:R-help at r-project.org" target="_blank">R-help at r-project.org</a> mailing list<br>
    PP>    PP> <a href="https://stat.ethz.ch/mailman/listinfo/r-help" target="_blank">https://stat.ethz.ch/mailman/listinfo/r-help</a><br>
    PP>    PP> PLEASE do read the posting guide <a href="http://www.R-project.org/posting-guide.html" target="_blank">http://www.R-project.org/posting-guide.html</a><br>
    PP>    PP> and provide commented, minimal, self-contained, reproducible code.<br>
    PP> </blockquote></div><br>




More information about the R-help mailing list