[BioC] GenomicRanges::nearest (especially follow)

Harris A. Jaffee hj at jhu.edu
Tue Oct 16 18:25:28 CEST 2012


Fantastic, thank you.  I'm not sure I knew about that page!

For example, ?precede gives me the IRanges nearest-methods page,
and only if I explicitly request ?'nearest-methods' can I choose
the GenomicRanges page.  That brings up another issue: ?distance
again sends me to the IRanges nearest-methods help, and that only
documents distanceToNearest, which is not enough.  The spec for
distance actually lives only on
?'precede,GenomicRanges,GenomicRanges-method', as far as I see.

Finally, and admittedly a total edge case, but zero-width ranges
appear to be mishandled, or not handled, by the IRanges distance
code.  I think they should be infinitely far from anything, or at
least NA, and there should not be any nearest or distanceToNearest
value for a zero-width query.

> x
IRanges of length 1
    start end width
[1]     2   1     0

> distance(x, x)
[1] 1

> y
IRanges of length 1
    start end width
[1]     3   2     0

> distance(x, y)
[1] 2

> z
IRanges of length 1
    start end width
[1]     3   3     1

> distance(x, z)
[1] 2

On Oct 15, 2012, at 6:27 PM, Valerie Obenchain wrote:

> A few follow up items from an off-list conversation with Harris.
> 
> 1) I reversed the use of 5' and3' in my response below. Sorry about that.
> 
> 2) The primary landing page for ?precede is in IRanges. I've expanded the \seealso section of this page to point the user to the man page in GenomicRanges.
> 
> 3) The precede/follow man page in GenomicRanges now  specifically states that 5' to 3' is the relevant orientation for precede/follow. For those interested in taking a look, the man page has several aliases,
> 
> ?'nearest-methods'
> ?'precede,GenomicRanges,GenomicRanges-method'
> ?'follow,GenomicRanges,GenomicRanges-method'
> 
> Changes can be found in IRanges 1.17.3  and GenomicRanges 1.11.4.
> 
> Thanks Harris.
> 
> Valerie
> 
> On 10/12/2012 04:17 PM, Valerie Obenchain wrote:
>> Hi Harris,
>> 
>> On 10/11/2012 02:27 PM, Harris A. Jaffee wrote:
>>> Apologies in advance if I just don't understand the ignore.strand switch, or
>>> perhaps GRanges objects.  Also, I have not even tried to understand
>>> 
>>>    GenomicRanges:::.GenomicRanges_findPrecedeFollow
>>> 
>>> I have the impression that a query with strand entirely "*" essentially implies
>>> ignore.strand, but this case here is handled correctly only if ignore.strand is
>>> TRUE (consistent with distanceToNearest but not distance, which seems correct):
>>> 
>>>> x = GRanges(ranges=IRanges(start=5, end=5), seqnames="chr1", strand="*")
>>>> Y = GRanges(ranges=IRanges(start=c(6,7), end=c(6,7)), seqnames="chr1", strand=c("+","-"))
>>>> distance(x, Y[1])
>>> [1] 1
>>>> distance(x, Y[2])
>>> [1] 2
>>> 
>>>> nearest(x, Y, ignore.strand=TRUE)    # correct
>>> [1] 1
>>>> nearest(ranges(x), ranges(Y))           # also correct
>>> [1] 1
>>> 
>>> However,
>>> 
>>>> nearest(x, Y)
>>> [1] 2
>> 
>> Thanks for reporting this bug. Now fixed in GenomicRanges 1.11.3 and 1.10.2.
>> 
>> The problem was in .nearest() where I computed the preceding and following distance of the ranges. I was missing abs() so the distance for the second range was -2 instead of 2. When checking to see which range was closest
>> 
>> > 1 < -2
>> [1] FALSE
>> 
>> so the second range won.
>>>> distanceToNearest(x, Y)
>>> DataFrame with 1 row and 3 columns
>>>  queryHits subjectHits  distance
>>> <integer> <integer> <integer>
>>> 1         1           2         2
>>> 
>>>> distanceToNearest(x, Y, ignore.strand=TRUE)
>>> DataFrame with 1 row and 3 columns
>>>   queryHits subjectHits  distance
>>> <integer> <integer> <integer>
>>> 1         1           1         1
>>> 
>>> 
>>> Finally,
>>> 
>>>> follow(ranges(x), ranges(Y))
>>> [1] NA
>>> 
>>> The issue (along with GenomicRanges:::.nearest) must come down to this:
>>> 
>>>> follow(x, Y)                # how can this be?
>>> [1] 2
>> 
>> This behavior is due to the fact that a '*' range can compare itself to either '+' or '-'.  The 'x' is '*" located at position 5.  When comparing it to the '+'  Y[1]
>> we think of both ranges as '+'. precede() and follow() locations are determined by moving from 3' to 5'. On '+' strand this is from left to right so 5 precedes 6.
>> 
>> > precede(x, Y[1])
>> [1] 1
>> 
>> but does not follow it
>> 
>> > follow(x, Y[1])
>> [1] NA
>> 
>> When comparing the '*' to Y[2] we think of both ranges as '-'. On the '-' moving from  3' to 5' is right to left so the  5 follows 7
>> 
>> > follow(x, Y[2])
>> [1] 1
>> 
>> but does not precede it
>> 
>> > precede(x, Y[2])
>> [1] NA
>> 
>> When we set ignore.strand=TRUE, all ranges are considered '+'.
>> 
>> Valerie
>>> whereas
>>> 
>>>> follow(x, Y, ignore.strand=TRUE)    # correct, I think
>>> [1] NA
>>> 
>>>> sessionInfo()
>>> R Under development (unstable) (2012-10-10 r60908)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> 
>>> locale:
>>>  [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
>>>  [5] LC_MONETARY=en_US.iso885915    LC_MESSAGES=en_US.iso885915
>>>  [7] LC_PAPER=C                     LC_NAME=C
>>>  [9] LC_ADDRESS=C                   LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices datasets  utils     methods   base
>>> 
>>> other attached packages:
>>> [1] GenomicRanges_1.11.0 IRanges_1.17.0       BiocGenerics_0.5.0
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] parallel_2.16.0 stats4_2.16.0   tools_2.16.0
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list