[BioC] GenomicRanges::nearest (especially follow)

Valerie Obenchain vobencha at fhcrc.org
Sat Oct 13 01:17:03 CEST 2012


Hi Harris,

On 10/11/2012 02:27 PM, Harris A. Jaffee wrote:
> Apologies in advance if I just don't understand the ignore.strand switch, or
> perhaps GRanges objects.  Also, I have not even tried to understand
>
> 	GenomicRanges:::.GenomicRanges_findPrecedeFollow
>
> I have the impression that a query with strand entirely "*" essentially implies
> ignore.strand, but this case here is handled correctly only if ignore.strand is
> TRUE (consistent with distanceToNearest but not distance, which seems correct):
>
>> x = GRanges(ranges=IRanges(start=5, end=5), seqnames="chr1", strand="*")
>> Y = GRanges(ranges=IRanges(start=c(6,7), end=c(6,7)), seqnames="chr1", strand=c("+","-"))
>> distance(x, Y[1])
> [1] 1
>> distance(x, Y[2])
> [1] 2
>
>> nearest(x, Y, ignore.strand=TRUE)	# correct
> [1] 1
>> nearest(ranges(x), ranges(Y))       	# also correct
> [1] 1
>
> However,
>
>> nearest(x, Y)
> [1] 2

Thanks for reporting this bug. Now fixed in GenomicRanges 1.11.3 and 1.10.2.

  The problem was in .nearest() where I computed the preceding and 
following distance of the ranges. I was missing abs() so the distance 
for the second range was -2 instead of 2. When checking to see which 
range was closest

 > 1 < -2
[1] FALSE

so the second range won.
>> distanceToNearest(x, Y)
> DataFrame with 1 row and 3 columns
>   queryHits subjectHits  distance
>   <integer>    <integer>  <integer>
> 1         1           2         2
>
>> distanceToNearest(x, Y, ignore.strand=TRUE)
> DataFrame with 1 row and 3 columns
>    queryHits subjectHits  distance
>    <integer>    <integer>  <integer>
> 1         1           1         1
>
>
> Finally,
>
>> follow(ranges(x), ranges(Y))
> [1] NA
>
> The issue (along with GenomicRanges:::.nearest) must come down to this:
>
>> follow(x, Y)				# how can this be?
> [1] 2

This behavior is due to the fact that a '*' range can compare itself to 
either '+' or '-'.  The 'x' is '*" located at position 5.  When 
comparing it to the '+'  Y[1]
  we think of both ranges as '+'. precede() and follow() locations are 
determined by moving from 3' to 5'. On '+' strand this is from left to 
right so 5 precedes 6.

 > precede(x, Y[1])
[1] 1

but does not follow it

 > follow(x, Y[1])
[1] NA

When comparing the '*' to Y[2] we think of both ranges as '-'. On the 
'-' moving from  3' to 5' is right to left so the  5 follows 7

 > follow(x, Y[2])
[1] 1

but does not precede it

 > precede(x, Y[2])
[1] NA

When we set ignore.strand=TRUE, all ranges are considered '+'.

Valerie
> whereas
>
>> follow(x, Y, ignore.strand=TRUE)	# correct, I think
> [1] NA
>
>> sessionInfo()
> R Under development (unstable) (2012-10-10 r60908)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
>   [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
>   [5] LC_MONETARY=en_US.iso885915    LC_MESSAGES=en_US.iso885915
>   [7] LC_PAPER=C                     LC_NAME=C
>   [9] LC_ADDRESS=C                   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
> [1] GenomicRanges_1.11.0 IRanges_1.17.0       BiocGenerics_0.5.0
>
> loaded via a namespace (and not attached):
> [1] parallel_2.16.0 stats4_2.16.0   tools_2.16.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list