[BioC] nearest() for GRanges

Dan Tenenbaum dtenenba at fhcrc.org
Tue Jun 26 00:09:58 CEST 2012


On Mon, Jun 25, 2012 at 2:53 PM, Cook, Malcolm <MEC at stowers.org> wrote:
> Hi Valerie,
>
> Indeed good news.
>
> However, I am finding that this newest version is not yet available view
> biocLite from repository at bioconductor.org.  I am still picking up 1.8.6
> with biocLite('GenomicRanges').
>
> Should I expect to wait, or perhaps is there a 'push' at your end that
> needs attending?
>
> Please advise if I'm expecting it to appear before its time ;)

Out build cycle runs once a day, so expect to see the next version
tomorrow morning around 10AM Seattle time. If you want to get it
before then, you can check it out from the svn repository.

Thanks,
Dan


>
> Thanks!
>
> Malcolm
>
>
>
> On 6/25/12 1:53 PM, "Valerie Obenchain" <vobencha at fhcrc.org> wrote:
>
>>This is now fixed in release, BiocC 2.10, GenomicRanges 1.8.7.
>>
>>Note the behavior of '*' is different from previous behavior (i.e., <= v
>>1.8.6). Treatment of '*' ranges was one of the aspects we clarified and
>>enforced in the the recent update of precede, follows and nearest.
>>
>>Previously in release '*' was treated as a '+' range,
>>
>>g <- GRanges("chr1", IRanges(c(1,5,10), c(2,7,12)), "*")
>> > g
>>GRanges with 3 ranges and 0 elementMetadata cols:
>>       seqnames    ranges strand
>><Rle> <IRanges> <Rle>
>>   [1]     chr1  [ 1,  2]      *
>>   [2]     chr1  [ 5,  7]      *
>>   [3]     chr1  [10, 12]      *
>>   ---
>>   seqlengths:
>>    chr1
>>      NA
>> > precede(g)
>>[1]  2  3 NA
>> > follow(g)
>>[1] NA  1  2
>> > nearest(g)
>>[1] 2 1 2
>>
>>
>>The new behavior of '*' (in both release and devel) considers both '+'
>>and '-' possibilities. For details see the 'matching by strand' section
>>described in precede() on the man page for ?GRanges.
>>
>> > precede(g)
>>[1] 2 1 2
>> > follow(g)
>>[1] 2 1 2
>> > nearest(g)
>>[1] 2 1 2
>>
>>
>>Valerie
>>
>>On 06/22/2012 03:25 PM, Cook, Malcolm wrote:
>>> Great news, Valerie... thanks very much... I will take immediate
>>>advantage
>>> of this... after re-reading your report of 'an overhaul' I would well
>>> understand if back-porting your fix in dev to release would be onerous
>>>to
>>> impossible.
>>>
>>> I hope it goes quickly and smoothly....
>>>
>>> Cheers,
>>>
>>> Malcolm
>>>
>>>
>>> On 6/22/12 4:00 PM, "Valerie Obenchain"<vobencha at fhcrc.org>  wrote:
>>>
>>>> On 06/20/2012 05:20 PM, Cook, Malcolm wrote:
>>>>> Hi Valerie,
>>>>>
>>>>> Very glad you found and fixed the root cause.
>>>>>
>>>>> I don't know the overhead it would take for you, but, this being a
>>>>> regression, might you consider fixing in Bioconductor 2.10 as, say
>>>>> GenomicRanges_1.8.
>>>>>
>>>> Yes, I will fix this in release too. If not today then first thing next
>>>> week.
>>>>
>>>> Valerie
>>>>> Thanks for your consideration,
>>>>>
>>>>> Malcolm
>>>>>
>>>>> On 6/20/12 3:13 PM, "Valerie Obenchain"<vobencha at fhcrc.org>   wrote:
>>>>>
>>>>>> Hi Oleg, Malcom,
>>>>>>
>>>>>> Thanks for the bug report. This is now fixed in devel 1.9.28.  Over
>>>>>>the
>>>>>> past months we've done an overhaul of the precede/follow code in
>>>>>>devel.
>>>>>> The new nearest method is based on the new precede and follow and is
>>>>>> documented at
>>>>>>
>>>>>> ?'nearest,GenomicRanges,GenomicRanges-method'
>>>>>>
>>>>>> Let me know if you run into problems.
>>>>>>
>>>>>> Valerie
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 06/18/2012 02:25 PM, Cook, Malcolm wrote:
>>>>>>> Martin, Oleg, Val, all,
>>>>>>>
>>>>>>> I too have script logic that benefitted from and depends upon what
>>>>>>>the
>>>>>>> behavior of nearest,GenomicRanges,missing as reported by Oleg.
>>>>>>>
>>>>>>> Thanks for the unit tests Martin.
>>>>>>>
>>>>>>> If it helps in sleuthing, in my hands, the 3rd test used to pass (if
>>>>>>> my
>>>>>>> memory serves), but does not pass now, as the attached transcript
>>>>>>> shows.
>>>>>>>
>>>>>>> Hoping it helps find a speedy resolution to this issue,
>>>>>>>
>>>>>>> ~ Malcolm Cook
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>     r<- IRanges(c(1,5,10), c(2,7,12))
>>>>>>>>     g<- GRanges("chr1", r, "+")
>>>>>>>>     checkEquals(precede(r), precede(g))
>>>>>>> [1] TRUE
>>>>>>>>      checkEquals(follow(r), follow(g))
>>>>>>> [1] TRUE
>>>>>>>>     try(checkEquals(nearest(r), nearest(g)))
>>>>>>> Error in checkEquals(nearest(r), nearest(g)) :
>>>>>>>      Mean relative difference: 0.6
>>>>>>>
>>>>>>>
>>>>>>>> sessionInfo()
>>>>>>> R version 2.15.0 (2012-03-30)
>>>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>>>
>>>>>>> locale:
>>>>>>> [1] C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>>     [1] tools     splines   parallel  stats     graphics  grDevices
>>>>>>> utils
>>>>>>> datasets  methods   base
>>>>>>>
>>>>>>> other attached packages:
>>>>>>>     [1] RUnit_0.4.26          log4r_0.1-4           vwr_0.1
>>>>>>> RecordLinkage_0.4-1   ffbase_0.5            ff_2.2-7
>>>>>>> bit_1.1-8             evd_2.2-6             ipred_0.8-13
>>>>>>> prodlim_1.3.1         KernSmooth_2.23-7     nnet_7.3-1
>>>>>>> survival_2.36-14      mlbench_2.1-0         MASS_7.3-18
>>>>>>> ada_2.0-2             rpart_3.1-53          e1071_1.6
>>>>>>> class_7.3-3           XLConnect_0.1-9       XLConnectJars_0.1-4
>>>>>>> rJava_0.9-3           latticeExtra_0.6-19   RColorBrewer_1.0-5
>>>>>>> lattice_0.20-6        doMC_1.2.5            multicore_0.1-7
>>>>>>> [28] BSgenome_1.24.0       rtracklayer_1.16.1    Rsamtools_1.8.5
>>>>>>> Biostrings_2.24.1     GenomicFeatures_1.8.1 AnnotationDbi_1.18.1
>>>>>>> GenomicRanges_1.8.6   IRanges_1.14.3        Biobase_2.16.0
>>>>>>> BiocGenerics_0.2.0    data.table_1.8.0      compare_0.2-3
>>>>>>> svUnit_0.7-10         doParallel_1.0.1      iterators_1.0.6
>>>>>>> foreach_1.4.0         ggplot2_0.9.1         sqldf_0.4-6.4
>>>>>>> RSQLite.extfuns_0.0.1 RSQLite_0.11.1        chron_2.3-42
>>>>>>> gsubfn_0.6-3          proto_0.3-9.2         DBI_0.2-5
>>>>>>> functional_0.1        reshape_0.8.4         plyr_1.7.1
>>>>>>> [55] stringr_0.6           gtools_2.6.2
>>>>>>>
>>>>>>> loaded via a namespace (and not attached):
>>>>>>>     [1] RCurl_1.91-1     XML_3.9-4        biomaRt_2.12.0
>>>>>>> bitops_1.0-4.1
>>>>>>> codetools_0.2-8  colorspace_1.1-1 compiler_2.15.0  dichromat_1.2-4
>>>>>>> digest_0.5.2     grid_2.15.0      labeling_0.1     memoise_0.1
>>>>>>> munsell_0.3      reshape2_1.2.1   scales_0.2.1     stats4_2.15.0
>>>>>>> tcltk_2.15.0     zlibbioc_1.2.0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 6/18/12 2:39 PM, "Martin Morgan"<mtmorgan at fhcrc.org>    wrote:
>>>>>>>
>>>>>>>> Hi Oleg --
>>>>>>>>
>>>>>>>> On 06/17/2012 11:11 PM, Oleg Mayba wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I just noticed that a piece of logic I was relying on with GRanges
>>>>>>>>> before
>>>>>>>>> does not seem to work anymore.  Namely, I expect the behavior of
>>>>>>>>> nearest()
>>>>>>>>> with a single GRanges object as an argument to be the same as that
>>>>>>>>> of
>>>>>>>>> IRanges (example below), but it's not anymore.  I expect
>>>>>>>>> nearest(GR1)
>>>>>>>>> NOT
>>>>>>>>> to behave trivially but to return the closest range OTHER than the
>>>>>>>>> range
>>>>>>>>> itself.  I could swear that was the case before, but isn't any
>>>>>>>>> longer:
>>>>>>>> I think you're right that there is an inconsistency here; Val will
>>>>>>>> likely help clarify in a day or so. My two cents...
>>>>>>>>
>>>>>>>> I think, certainly, that GRanges on a single chromosome on the "+"
>>>>>>>> strand should behave like an IRanges
>>>>>>>>
>>>>>>>>      library(GenomicRanges)
>>>>>>>>      library(RUnit)
>>>>>>>>
>>>>>>>>      r<- IRanges(c(1,5,10), c(2,7,12))
>>>>>>>>      g<- GRanges("chr1", r, "+")
>>>>>>>>
>>>>>>>>      ## first two ok, third should work but fails
>>>>>>>>      checkEquals(precede(r), precede(g))
>>>>>>>>      checkEquals(follow(r), follow(g))
>>>>>>>>      try(checkEquals(nearest(r), nearest(g)))
>>>>>>>>
>>>>>>>> Also, on the "-" strand I think we're expecting
>>>>>>>>
>>>>>>>>      g<- GRanges("chr1", r, "-")
>>>>>>>>
>>>>>>>>      ## first two ok, third should work but fails
>>>>>>>>      checkEquals(follow(r), precede(g))
>>>>>>>>      checkEquals(precede(r), follow(g))
>>>>>>>>      try(checkEquals(nearest(r), nearest(g)))
>>>>>>>>
>>>>>>>> For "*" (which was your example) I think the situation is (a)
>>>>>>>> different
>>>>>>>> in devel than in release; and (b) not so clear. In devel, "*" is
>>>>>>>> (from
>>>>>>>> method?"nearest,GenomicRanges,missing") "x on '*' strand can match
>>>>>>>>to
>>>>>>>> ranges on any of ''+'', ''-'' or ''*''" and in particular I think
>>>>>>>> these
>>>>>>>> are always true:
>>>>>>>>
>>>>>>>>      checkEquals(precede(g), follow(g))
>>>>>>>>      checkEquals(nearest(r), follow(g))
>>>>>>>>
>>>>>>>> I would also expect
>>>>>>>>
>>>>>>>>      try(checkEquals(nearest(g), follow(g)))
>>>>>>>>
>>>>>>>> though this seems not to be the case. In 'release', "*" is coereced
>>>>>>>> and
>>>>>>>> behaves as if on the "+" strand (I think).
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>>>>>>>>>> z
>>>>>>>>> IRanges of length 3
>>>>>>>>>         start end width
>>>>>>>>> [1]     1   2     2
>>>>>>>>> [2]     5   7     3
>>>>>>>>> [3]    10  12     3
>>>>>>>>>> nearest(z)
>>>>>>>>> [1] 2 1 2
>>>>>>>>>> z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
>>>>>>>>> end=c(2,7,12)))
>>>>>>>>>> z
>>>>>>>>> GRanges with 3 ranges and 0 elementMetadata cols:
>>>>>>>>>           seqnames    ranges strand
>>>>>>>>>              <Rle>     <IRanges>      <Rle>
>>>>>>>>>       [1]     chr1  [ 1,  2]      *
>>>>>>>>>       [2]     chr1  [ 5,  7]      *
>>>>>>>>>       [3]     chr1  [10, 12]      *
>>>>>>>>>       ---
>>>>>>>>>       seqlengths:
>>>>>>>>>        chr1
>>>>>>>>>          NA
>>>>>>>>>> nearest(z)
>>>>>>>>> [1] 1 2 3
>>>>>>>>>> sessionInfo()
>>>>>>>>> R version 2.15.0 (2012-03-30)
>>>>>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>>>>>
>>>>>>>>> locale:
>>>>>>>>>      [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>>>      [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>>>      [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>>>>>>      [7] LC_PAPER=C                 LC_NAME=C
>>>>>>>>>      [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>>
>>>>>>>>> attached base packages:
>>>>>>>>> [1] datasets  utils     grDevices graphics  stats     methods
>>>>>>>>>base
>>>>>>>>>
>>>>>>>>> other attached packages:
>>>>>>>>> [1] GenomicRanges_1.8.6 IRanges_1.14.3      BiocGenerics_0.2.0
>>>>>>>>>
>>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>>> [1] stats4_2.15.0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I want the IRanges behavior and not what seems currently to be the
>>>>>>>>> GRanges
>>>>>>>>> behavior, since I have some code that depends on it. Is there a
>>>>>>>>> quick
>>>>>>>>> way
>>>>>>>>> to make nearest() do that for me again?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Oleg.
>>>>>>>>>
>>>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioconductor mailing list
>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>> Search the archives:
>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>> --
>>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>>>>> 1100 Fairview Ave. N.
>>>>>>>> PO Box 19024 Seattle, WA 98109
>>>>>>>>
>>>>>>>> Location: Arnold Building M1 B861
>>>>>>>> Phone: (206) 667-2793
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list