[R] Scaling rows of a large Matrix::sparseMatrix()
tomdharray at gmail.com
tomdharray at gmail.com
Wed Jan 13 02:50:54 CET 2016
Hello R-Users,
I'm looking for a way to scale the rows of a sparse matrix M with about
57,000 rows, 14,000 columns, and 238,000 non-zero matrix elements; see
example code below.
Usually I'd use the base::scale() function (see sample code), but it
freezes my computer. The same happens when I try to run a for loop over
the matrix rows.
The conversion with as.matrix() yields a 5.8 Gb large object, which
appears too large for scale().
So my question is: How can the rows of a large sparse matrix be
efficiently scaled?
Thanks and regards,
Dirk
### Hardware/Session Info
Intel Core i7 w/ 12 Gb RAM
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
### Example Code
library(Matrix)
set.seed(42)
## These are exemplary values for my real "problem matrix"
N_ROW <- 56743
N_COL <- 13648
SIZE <- 238283
PROB <- c(0.050, 0.050, 0.099, 0.149, 0.198, 0.178, 0.119,
0.079, 0.0297, 0.0198, 0.001, 0.001, 0.001)
## get some random values to populate the sparse matrix
x <- do.call(
what = rbind,
args = lapply(X = 1:N_ROW,
FUN = function(i)
expand.grid(i,
sample(x = 1:N_COL,
size = sample(1:15, 1),
replace = TRUE)
)
)
)
x[,3] <- sample(x = 1:13, size = nrow(x),
replace = TRUE, prob = PROB)
## build the sparse matrix
M <- Matrix::sparseMatrix(
dims = c(N_ROW, N_COL),
i = x[,1],
j = x[,2],
x = x[,3]
)
print(format(object.size(M), units = "auto"))
## *******************************************
## Scaling the rows of M
## scale() lets my computer freeze
# M <- scale(t(M), center = FALSE, scale(Matrix::rowSums(M)))
## this appears to be not elegant at all and takes forever
# rwsms <- Matrix::rowSums(M)
# for (i in 1:nrow(M)) M[i,] <- M[i,]/rwsms[[i]]
More information about the R-help
mailing list