[BioC] findOverlaps method in GenomicRanges not supporting type="equal" for GRangesList, GRangesList?

Hervé Pagès hpages at fhcrc.org
Fri Nov 22 23:13:04 CET 2013


Hi Michael,

On 11/21/2013 12:59 PM, Michael Lawrence wrote:
> Ok. I think my code is broken anyway, in cases where ranges are repeated
> in one of the GRanges. Feel free to use some of it or delete it. As for
> the zero width ranges, I'm guessing people are usually looking for
> match()-like behavior, rather than findOverlaps() behavior, when
> type="equals", so we might need another interface?

My preference would be to keep the findOverlaps interface with a note
in the man page for findOverlaps,GRangesList,GRangesList about special
treatment of zero-width ranges.

> Also, I'm guessing
> that the hash-based match() is a lot faster than the interval-tree
> approach, so we might want to use that, except perhaps in the circular
> sequence case.

Yes we should probably reuse the hash-based match() internally to
implement findOverlaps(type="equal"). Ranges on circular sequences
just need to be shifted by a multiple of the sequence length before
match() is called so their start is >= 1 and <= sequence length.

Cheers,
H.

>
> Michael
>
>
>
>
> On Thu, Nov 21, 2013 at 12:02 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi Michael, Nico,
>
>     Right now match/== methods for List objects behave inconsistently.
>     For example, even for conceptually close objects like IntegerList
>     and XIntegerViews, we have:
>
>        x <- IntegerList(a=1:5, b=2:-3, c=1:3)
>        v <- successiveViews(unlist(x), elementLengths(x))
>
>        > x == rev(x)
>        LogicalList of length 3
>        [["a"]] TRUE TRUE TRUE FALSE FALSE
>        [["b"]] TRUE TRUE TRUE TRUE TRUE TRUE
>        [["c"]] TRUE TRUE TRUE FALSE FALSE
>
>        > v == rev(v)
>        [1] FALSE  TRUE FALSE
>
>        > match(x, rev(x))
>        IntegerList of length 3
>        [["a"]] 1 2 3 <NA> <NA>
>        [["b"]] 1 2 3 4 5 6
>        [["c"]] 1 2 3
>
>        > match(v, rev(v))
>        Error in base::match(x, table, nomatch = nomatch, incomparables =
>     incomparables,  :
>          'match' requires vector arguments
>
>     This is not a good situation and there is still some work that needs to
>     be done at some point in the future to clean-up the match/== methods in
>     IRanges/GenomicRanges. In the mean time I think we should hold on
>     adding new methods for List objects until there is a clear consensus on
>     how they should behave.
>
>     As for Nico's request, I agree that the best way to go would be to just
>     make findOverlaps(type="equal") work. There are some subtle semantic
>     differences between a *match* (as reported by match or ==), and equality
>     from a range overlap point of view. The former can report equality
>     for ranges on a circular sequence that are not considered equal for
>     the latter. Another difference is how zero-width ranges are handled.
>
>     Thanks,
>     H.
>
>
>
>     On 11/21/2013 10:43 AM, Michael Lawrence wrote:
>
>         So I've checked into devel a match,GRangesList,GRangesList. This
>         allows
>         findMatches() to return what you want. There is a question
>         though before
>         this is approved: does it make sense for match() to act like
>         findOverlaps
>         and consider each GRanges atomically (one returned index per
>         GRanges) or
>         should match behave as it does other Lists and return an
>         IntegerList, with
>         a value per range, grouped by the top-level elements. If we
>         decide on the
>         latter, then the method I wrote needs to be removed and the
>         implementation
>         moved to the "equals" mode in findOverlaps. Either way,
>         findOverlaps(type="equals") should be made to work.
>
>         Michael
>
>
>         On Thu, Nov 21, 2013 at 8:13 AM, Nicolas Delhomme
>         <nicolas.delhomme at umu.se <mailto:nicolas.delhomme at umu.se>>__wrote:
>
>             Thanks!
>             ------------------------------__------------------------------__---
>             Nicolas Delhomme
>
>             Nathaniel Street Lab
>             Department of Plant Physiology
>             Umeå Plant Science Center
>
>             Tel: +46 90 786 7989 <tel:%2B46%2090%20786%207989>
>             Email: nicolas.delhomme at plantphys.__umu.se
>             <mailto:nicolas.delhomme at plantphys.umu.se>
>             SLU - Umeå universitet
>             Umeå S-901 87 Sweden
>             ------------------------------__------------------------------__---
>
>             On 21 Nov 2013, at 17:06, Michael Lawrence
>             <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>             wrote:
>
>                 I will work on this today.
>
>                 Michael
>
>
>                 On Thu, Nov 21, 2013 at 4:43 AM, Nicolas Delhomme <
>
>             nicolas.delhomme at umu.se <mailto:nicolas.delhomme at umu.se>> wrote:
>
>                 Hej Bioc!
>
>                 When I try to find “equal” ranges from two GRangesList
>                 object, I get the
>
>             following error:
>
>
>                     findOverlaps(query=grng.def,__subject=grng.mod,type="equal")
>
>                 Error in match.arg(type) :
>                     'arg' should be one of “any”, “start”, “end”, “within”
>
>                 Isn’t type=“equal” supported for the GRangesList,
>                 GRangesList signature?
>
>                 Cheers,
>
>                 Nico
>
>                 sessionInfo()
>                 R version 3.0.2 (2013-09-25)
>                 Platform: x86_64-apple-darwin13.0.0 (64-bit)
>
>                 locale:
>                 [1]
>                 en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
>
>                 attached base packages:
>                 [1] parallel  stats     graphics  grDevices utils
>                 datasets  methods
>
>                 base
>
>
>                 other attached packages:
>                    [1] easyRNASeq_1.8.2       ShortRead_1.20.0
>                 Rsamtools_1.14.1
>
>                 GenomicRanges_1.14.3   DESeq_1.14.0
>             lattice_0.20-24
>                locfit_1.5-9.1
>
>                    [8] Biostrings_2.30.1      XVector_0.2.0
>                   IRanges_1.20.5
>
>                 edgeR_3.4.0            limma_3.18.3           biomaRt_2.18.0
>             Biobase_2.22.0
>
>                 [15] genomeIntervals_1.18.0 BiocGenerics_0.8.0
>                 intervals_0.14.0
>
>                 loaded via a namespace (and not attached):
>                    [1] annotate_1.40.0      AnnotationDbi_1.24.0
>                 bitops_1.0-6
>
>             DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0
>                grid_3.0.2
>                       hwriter_1.3
>
>                    [9] latticeExtra_0.6-26  LSD_2.5
>                   RColorBrewer_1.0-5
>
>             RCurl_1.95-4.1       RSQLite_0.11.4       splines_3.0.2
>                 stats4_3.0.2
>                       survival_2.37-4
>
>                 [17] tools_3.0.2          XML_3.98-1.1         xtable_1.7-1
>
>             zlibbioc_1.8.0
>
>
>
>                 ------------------------------__------------------------------__---
>                 Nicolas Delhomme
>
>                 Nathaniel Street Lab
>                 Department of Plant Physiology
>                 Umeå Plant Science Center
>
>                 Tel: +46 90 786 7989 <tel:%2B46%2090%20786%207989>
>                 Email: nicolas.delhomme at plantphys.__umu.se
>                 <mailto:nicolas.delhomme at plantphys.umu.se>
>                 SLU - Umeå universitet
>                 Umeå S-901 87 Sweden
>
>                 _________________________________________________
>                 Bioconductor mailing list
>                 Bioconductor at r-project.org
>                 <mailto:Bioconductor at r-project.org>
>                 https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                 <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                 Search the archives:
>
>             http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>             <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>
>                  [[alternative HTML version deleted]]
>
>
>
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list