[R] Scaling rows of a large Matrix::sparseMatrix()

Gerrit Eichner Gerrit.Eichner at math.uni-giessen.de
Wed Jan 13 09:23:38 CET 2016


Hello, Dirk,

maybe I'm missing something, but to avoid your for-loop-approach doesn't

M <- M/Matrix::rowSums(M)

do what you want?

  Hth  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
gerrit.eichner at math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109            http://www.uni-giessen.de/eichner
---------------------------------------------------------------------

> Hello R-Users,
>
> I'm looking for a way to scale the rows of a sparse matrix M with about
> 57,000 rows, 14,000 columns, and 238,000 non-zero matrix elements; see
> example code below.
>
> Usually I'd use the base::scale() function (see sample code), but it
> freezes my computer. The same happens when I try to run a for loop over
> the matrix rows.
>
> The conversion with as.matrix() yields a 5.8 Gb large object, which
> appears too large for scale().
>
>
> So my question is: How can the rows of a large sparse matrix be
> efficiently scaled?
>
> Thanks and regards,
>
> Dirk
>
>
> ### Hardware/Session Info
> Intel Core i7 w/ 12 Gb RAM
> R version 3.2.1 (2015-06-18)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> Running under: Ubuntu 14.04.3 LTS
>
> ### Example Code
> library(Matrix)
> set.seed(42)
>
> ## These are exemplary values for my real "problem matrix"
> N_ROW <- 56743
> N_COL <- 13648
> SIZE  <- 238283
> PROB <- c(0.050, 0.050, 0.099, 0.149, 0.198, 0.178, 0.119,
>          0.079, 0.0297, 0.0198, 0.001, 0.001, 0.001)
>
> ## get some random values to populate the sparse matrix
> x <- do.call(
>  what = rbind,
>  args = lapply(X = 1:N_ROW,
>                FUN = function(i)
>                  expand.grid(i,
>                    sample(x = 1:N_COL,
>                      size = sample(1:15, 1),
>                      replace = TRUE)
>                  )
>         )
> )
> x[,3] <- sample(x = 1:13, size = nrow(x),
>           replace = TRUE, prob = PROB)
>
> ## build the sparse matrix
> M <- Matrix::sparseMatrix(
>       dims = c(N_ROW, N_COL),
>       i = x[,1],
>       j = x[,2],
>       x = x[,3]
> )
> print(format(object.size(M), units = "auto"))
>
> ## *******************************************
> ## Scaling the rows of M
>
> ## scale() lets my computer freeze
> # M <- scale(t(M), center = FALSE, scale(Matrix::rowSums(M)))
>
> ## this appears to be not elegant at all and takes forever
> # rwsms <- Matrix::rowSums(M)
> # for (i in 1:nrow(M)) M[i,] <- M[i,]/rwsms[[i]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list