[BioC] duplicated on IRanges object

Manuela Hummel manuela.hummel at crg.es
Fri Oct 22 16:44:30 CEST 2010


Hi,

there seems to be a numerical issue when applying 'duplicated' on an IRanges object.
When there are two ranges that are almost the same, and within the IRanges object there are some other ranges with huge width, 'duplicated' identifies the two "almost the same" ranges as "the same".

If we take for example those two ranges:

> ir <- IRanges(start=rep(1000000000, 2), width=200:201)
> ir
IRanges of length 2
         start        end width
[1] 1000000000 1000000199   200
[2] 1000000000 1000000200   201


They are obviously not the same: 

> duplicated(ir)
[1] FALSE FALSE


But when we now add another range with huge width:

> ir2
IRanges of length 3
         start        end    width
[1] 1000000000 1000000199      200
[2] 1000000000 1000000200      201
[3]    5000000  100000000 95000001


... the second range is detected as duplicate of the first:

> duplicated(ir2)
[1] FALSE  TRUE FALSE


I guess the problem is that in .toNumericWithCompatibleOrder the variable max_width gets so large, such that
start(x) + width(x)/(max_width+1.00)
gets numerically identical for ranges like the first two in the example.

Best regards
Manuela

Ps: By the way, thanks for the great IRanges package! It makes working with sequence data so much easier.


> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] IRanges_1.8.0



Manuela Hummel
Core Facilities - Microarrays Unit
Center for Genomic Regulation (CRG)
Dr. Aiguader 88, 4th flour, Office 439.01
08003 Barcelona
Phone: +34 93 316 0373
e-mail: manuela.hummel at crg.es
  



More information about the Bioconductor mailing list