[R] sparse matrix, rnorm, malloc

roger koenker roger at ysidro.econ.uiuc.edu
Sun Jun 11 01:13:31 CEST 2006


As an example of how one might do this sort of thing in SparseM
ignoring the rounding aspect...

require(SparseM)
require(msm) #for rtnorm
sm <- function(dim,rnd,q){
         n <- rbinom(1, dim * dim, 2 * pnorm(q) - 1)
         ia <- sample(dim,n,replace = TRUE)
         ja <- sample(dim,n,replace = TRUE)
         ra <- rtnorm(n,lower = -q, upper = q)
         A <- new("matrix.coo", ia = as.integer(ia), ja = as.integer 
(ja), ra = ra, dimension = as.integer(c(dim,dim)))
         A <- as.matrix.csr(A)
         }

For dim = 5000 and q = .03 which exceeds Gavin's suggested  1 percent
density, this takes about 30 seconds on my imac and according to Rprof
about 95 percent of that (total) time is spent generating the  
truncated normals.
Word of warning:  pushing this too much further  gets tedious  since the
number of random numbers grows like dim^2.  For example, dim = 20,000
and q = .02 takes 432 seconds with again 93% of the total time spent in
rnorm and rtnorm...


url:    www.econ.uiuc.edu/~roger                Roger Koenker
email   rkoenker at uiuc.edu                       Department of Economics
vox:    217-333-4558                            University of Illinois
fax:    217-244-6678                            Champaign, IL 61820


On Jun 10, 2006, at 12:53 PM, g l wrote:

> Hi,
>
> I'm Sorry for any cross-posting. I've reviewed the archives and could
> not find an exact answer to my question below.
>
> I'm trying to generate very large sparse matrices (< 1% non-zero
> entries per row). I have a sparse matrix function below which works
> well until the row/col count exceeds 10,000. This is being run on a
> machine with 32G memory:
>
> sparse_matrix <- function(dims,rnd,p) {
>          ptm <- proc.time()
>          x <- round(rnorm(dims*dims),rnd)
>          x[((abs(x) - p) < 0)] <- 0
>          y <- matrix(x,nrow=dims,ncol=dims)
>          proc.time() - ptm
> }
>
> When trying to generate the matrix around 20,000 rows/cols on a
> machine with 32G of memory, the error message I receive is:
>
> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> Error: cannot allocate vector of size 3125000 Kb
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question:  on machine w/32G memory, why
> can't it allocate a vector of size 3125000 Kb?
>
> When trying to generate the matrix around 30,000 rows/cols, the error
> message I receive is:
>
> Error in rnorm(dims * dims) : cannot allocate vector of length  
> 900000000
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: is this 900000000 bytes?
> kilobytes? This error seems to be specific now to rnorm, but it
> doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
> rows/cols. Even if this Mb, why can't this be allocated on a machine
> with 32G free memory?
>
> When trying to generate the matrix with over 50,000 rows/cols, the
> error message I receive is:
>
> Error in rnorm(n, mean, sd) : invalid arguments
> In addition: Warning message:
> NAs introduced by coercion
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Same.
>
> Why would it generate different errors in each case? Code fixes? Any
> simple ways to generate sparse matrices which would avoid above
> problems?
>
> Thanks in advance,
>
> Gavin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html



More information about the R-help mailing list