[BioC] GenomicRanges: nearest() for GRanges not returning overlaps

James Perkins jperkins at biochem.ucl.ac.uk
Tue Jul 3 19:58:27 CEST 2012


Thanks,

But it appears in the example (using GenomicRanges_1.8.7), the second
range is being excluded? I.e. 3-7 matches 5-6 (and 1-3) and thus is

> subject2 <- GRanges(seqnames=rep('chr1',3),ranges=IRanges(c(3, 5, 12),
+ >> > c(3, 6, 12)))
> subject2 <- GRanges(seqnames=rep('chr1',3),ranges=IRanges(c(3, 5, 12), c(3, 6, 12)))
> query2
GRanges with 3 ranges and 0 elementMetadata cols:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1   [1,  2]      *
  [2]     chr1   [3,  7]      *
  [3]     chr1   [9, 10]      *
  ---
  seqlengths:
   chr1
     NA
> subject2
GRanges with 3 ranges and 0 elementMetadata cols:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 3,  3]      *
  [2]     chr1  [ 5,  6]      *
  [3]     chr1  [12, 12]      *
  ---
  seqlengths:
   chr1
     NA
> nearest(query2, subject2)
[1] 1 3 3
> nearest(subject2, query2)
[1] 1 1 3

Similarly

nearest(query2, query2)
[1] 2 1 2

should surely be "1 2 3"? Or have I missed something?

IRanges is as I would expect: > nearest(ranges(query2), ranges(subject2))
[1] 1 1 3


Jim


On 3 July 2012 19:40, Michael Lawrence <lawrence.michael at gene.com> wrote:
> The way IRanges works is that the self hits are excluded (by default) when
> only one argument is passed to nearest(). They should not be excluded when
> the same object is passed as both arguments.
>
> Michael
>
> On Tue, Jul 3, 2012 at 7:09 AM, James Perkins <jperkins at biochem.ucl.ac.uk>
> wrote:
>>
>> Hi,
>>
>> I read this:
>> https://mailman.stat.ethz.ch/pipermail/bioconductor/2012-June/046287.html
>>
>> However, after reading it I'm a little confused as to what the default
>> behaviour should be when using nearest with '*', and also how to get
>> back to what I believe was the previous, "IRanges" style behaviour.
>>
>> Is nearest on GRanges supposed to act like nearest on IRanges, or is
>> it instead supposed to return the nearest neighbour OTHER than the
>> query one?
>>
>> I.e.
>>
>>
>> > query <- IRanges(c(1, 3, 9), c(2, 7, 10))
>> > subject <- IRanges(c(3, 5, 12), c(3, 6, 12))
>> > nearest(query, subject)
>> [1] 1 1 3
>>
>> Because, it seems to be behaving differently, i.e. returning the
>> neighbour only, i.e.
>>
>>
>> > query2 <- GRanges(seqnames=rep('chr1',3),ranges=IRanges(c(1, 3, 9),
>> > c(2, 7, 10)))
>> > subject2 <- GRanges(seqnames=rep('chr1',3),ranges=IRanges(c(3, 5, 12),
>> > c(3, 6, 12)))
>> > nearest(query2, subject2)
>> [1] 1 3 2
>> >
>> > sessionInfo()
>> R version 2.15.1 (2012-06-22)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US             LC_NUMERIC=C
>>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.8.7 IRanges_1.14.4      BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>> [1] stats4_2.15.1
>>
>>
>> However, what makes me confused is that the previous behaviour was like
>> IRanges:
>>
>> > query <- IRanges(c(1, 3, 9), c(2, 7, 10))
>> > subject <- IRanges(c(3, 5, 12), c(3, 6, 12))
>> > nearest(query, subject)
>> [1] 1 1 3
>> >
>> > query2 <- GRanges(seqnames=rep('chr1',3),ranges=IRanges(c(1, 1, 12),
>> > c(2, 7, 12)))
>> > subject2 <- GRanges(seqnames=rep('chr1',3),ranges=IRanges(c(3, 5, 12),
>> > c(3, 6, 12)))
>> > nearest(query2, subject2)
>> [1] 1 1 3
>> >
>> > sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.8.6 IRanges_1.14.3      BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>> [1] stats4_2.15.0
>>
>>
>> And in fact, this has always been the behaviour right? i.e.: for R 2.13.1
>>
>> > query <- IRanges(c(1, 3, 9), c(2, 7, 10))
>> > subject <- IRanges(c(3, 5, 12), c(3, 6, 12))
>> > nearest(query, subject)
>> [1] 1 1 3
>> >
>> > query2 <- GRanges(seqnames=rep('chr1',3),ranges=IRanges(c(1, 1, 12),
>> > c(2, 7, 12)))
>> > subject2 <- GRanges(seqnames=rep('chr1',3),ranges=IRanges(c(3, 5, 12),
>> > c(3, 6, 12)))
>> > nearest(query2, subject2)
>> [1] 1 1 3
>> >
>> > sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.4.8 IRanges_1.10.6
>>
>>
>> Is ignore.strand=TRUE intended to get the IRanges-like behaviour?
>> Because I have problems with this too:
>>
>>  nearest(x = query2, subject = subject2, ignore.strand=TRUE)
>> Error in strand(x) <- strand(subject) <- "+" : object 'x' not found
>>
>> Thanks!
>>
>> Jim
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list