[BioC] distances for IRanges
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Wed Jun 9 02:36:43 CEST 2010
Thanks for pointing out nearest and friends; I agree that this
function should address my question.
Reading the man page for nearest function, might I suggest an
additional argument like
multihits = c("arbitrary", "all")
with the intention that a user can get full information in case one
range overlaps (or ties in distance) with multiple other ranges. The
return value could be a sparse matrix, findOverlaps-like. I find it
important to know about multiple hits, especially in the case when a
range has multiple overlaps.
On Tue, Jun 8, 2010 at 1:25 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> For all pairwise distances, something simple based on outer() should
> suffice. It might not be very space efficient, but speed should be somewhat
> close to optimal.
> What is the end goal of this? For example, the nearest() function finds
> nearest neighbors efficiently.
> You might be able to leverage findOverlaps(). For example, one can set the
> maximum gap between ranges to be considered overlapping. That could be set
> to a non-zero value representing some maximum allowable distance. The sparse
> doublet matrix from as.matrix() would be pretty efficient for distance
> calculation, via the pgap() function.
> On Tue, Jun 8, 2010 at 8:51 AM, Kasper Daniel Hansen
> <kasperdanielhansen at gmail.com> wrote:
>> Assuming I have two IRanges, each with multiple ranges, like
>> ir1 = IRanges(start = 3:6, width = 2)
>> ir2 = IRanges(start = 10:17, width = 2)
>> Is there a fast way to compute a pairwise distance matrix between the
>> two sets, by which I mean
>> ii = 1
>> jj = 2
>> width(gaps(c(ir1[ii], ir2[jj])))
>> where ii, jj would index into a result matrix. Essentially this would
>> be an expanded version of findOverlaps, since any two ranges with
>> distance = 0, have an overlap.
>> Is such functionality available in IRanges, in an efficient
>> implementation (think of the case where the two IRanges have - say -
>> 10,000 ranges or more)?
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
More information about the Bioconductor