[Rd] dgTMatrix Segmentation Fault

Sokol Serguei @oko| @end|ng |rom |n@@-tou|ou@e@|r
Mon Jun 7 10:00:13 CEST 2021


Le 07/06/2021 à 09:00, Dario Strbenac a écrit :
> Good day,
>
> I notice that summing rows of a large dgTMatrix fails.
>
> library(Matrix)
> aMatrix <- new("dgTMatrix",
>                  i = as.integer(sample(200000, 10000)-1), j = as.integer(sample(50000, 10000)-1), x = rnorm(10000),
> 		Dim = c(200000L, 50000L)
> 	      )
> totals <- rowSums(aMatrix == 0)  # Segmentation fault.

On my R v4.1 (Ubuntu 18), I don't have a segfault but I do have an error 
message:

Error in h(simpleError(msg, call)) :
   error in evaluating the argument 'x' in selecting a method for 
function 'rowSums': cannot allocate vector of size 372.5 Gb

And the reason for this is quite clear: an intermediate logical matrix 
'aMatrix == 0' is almost dense thus having 200000L*50000L - 10000L non 
zero entries. It is a little bit too much ;) for my modest laptop. So I 
can propose a workaround:

     totals <- 50000 - rowSums(aMatrix != 0)

Hoping it helps.

Best,
Serguei.

>
> The server has 768 GB of RAM and it was never close to being consumed by this. Converting it to an ordinary matrix works fine.
>
> big <- as.matrix(aMatrix)
> totals <- rowSums(big == 0)      # Uses more RAM but there is no segmentation fault and result is returned.
>
> May it be made more robust for dgTMatrix?
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list