[Rd] dgTMatrix Segmentation Fault

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Thu Jun 10 03:11:18 CEST 2021


   Nice!

On 6/9/21 9:00 PM, Dario Strbenac via R-devel wrote:
> Good day,
> 
> Thanks to handy hints from Martin Morgan, I ran R under gdb and checked for any numeric overflow. We pinpointed the cause:
> 
> (gdb) info locals
> i = 0
> j = 10738
> m = 200000
> n = 50000
> ans = 0x55555b332790
> aa = 0x55555b3327c0
> 
> There is a line of C code in dgeMatrix.c for (i = 0; i < m; i++) aa[i] += xx[i + j * m];
> 
> i  + j * m are all int, and overflow
> (lldb) print 0 + 10738 * 200000
> (int) $5 = -2147367296
> 
> So, either the code should check that this doesn't occur, or be adjusted to allow for large indexes.
> 
> If anyone is interested, this is in the context of single-cell ATAC-seq data, which typically has about 200000 genomic regions (rows) and perhaps 100000 biological cells (columns).
> 
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list