[BioC] IRanges coverage integer limit?

Nicolas Delhomme delhomme at embl.de
Wed Jul 4 11:16:39 CEST 2012


Great, thanks!

Hervé - how much effort is it to extend it to numeric? I'm willing to do it, I just do not want to start on something where YOU would say it's though ;-)

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On Jul 3, 2012, at 8:00 PM, Hervé Pagès wrote:

> On 07/03/2012 09:40 AM, Nicolas Delhomme wrote:
>> Hi,
>> 
>> I've just discovered that the IRanges coverage function would "overflow" without warnings. Below is an example that reproduce it:
>> 
>> library(IRanges)
>> rngs <- IRanges(c(1:100),width=100)
>> coverage(rngs)
>> 
>> 'integer' Rle of length 199 with 199 runs
>>   Lengths:  1  1  1  1  1  1  1  1  1  1  1 ...  1  1  1  1  1  1  1  1  1  1
>>   Values :  1  2  3  4  5  6  7  8  9 10 11 ... 10  9  8  7  6  5  4  3  2  1
>> 
>> coverage(rngs,weight=1e9)
>> 
>> 'integer' Rle of length 200 with 200 runs
>>   Lengths:           1           1           1 ...           1           1
>>   Values :  1000000000  2000000000 -1294967296 ...  1000000000           0
>> 
>> runValue(coverage(rngs,weight=1e9))
>>   [1]  1000000000  2000000000 -1294967296  -294967296   705032704  1705032704
>>   [7] -1589934592  -589934592   410065408  1410065408 -1884901888  -884901888
>> ...
>> 
>> Clearly, the third position that has a coverage of 3 (not weighted) has a 3e9 weighted one which is > 2^31 (signed integer limit on most machine). I'm just surprised that it is silently ignored.
>> 
>> For NGS, getting a bp coverage > 2^31 is unlikely, although I've already seen extremely high coverage for Ribosomal-like protein that were only 10 order of magnitude away (~2M X). This limits the ranges of weights that can be used (weight as of now can only be integers), i.e. a weight of 100 would already be borderline.
>> 
>> Is there a way around this, coverage being such a very handy function? I understand that weight being integers probably makes computation faster, but what could be the overhead of allowing numeric instead? And I don't mind looking under the hood if that helps.
> 
> Thanks Nico for catching this other one. I will keep operations in the
> int space for now (so an 'integer' Rle is always returned) but will make
> sure a warning is issued and NAs are returned in case of overflow.
> 
> H.
> 
>> 
>> Cheers,
>> 
>> Nico
>> 
>> sessionInfo()
>> R version 2.15.1 (2012-06-22)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>> 
>> locale:
>> [1] C/UTF-8/C/C/C/C
>> 
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> 
>> other attached packages:
>> [1] IRanges_1.15.17    BiocGenerics_0.3.0
>> 
>> loaded via a namespace (and not attached):
>> [1] stats4_2.15.1 tools_2.15.1
>> 
>> 
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>> 
>> Genome Biology Computational Support
>> 
>> European Molecular Biology Laboratory
>> 
>> Tel: +49 6221 387 8310
>> Email: nicolas.delhomme at embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 
> 



More information about the Bioconductor mailing list