[BioC] findOverlaps method in GenomicRanges not supporting type="equal" for GRangesList, GRangesList?

Hervé Pagès hpages at fhcrc.org
Thu Nov 21 21:05:20 CET 2013



On 11/21/2013 12:02 PM, Hervé Pagès wrote:
> Hi Michael, Nico,
>
> Right now match/== methods for List objects behave inconsistently.
> For example, even for conceptually close objects like IntegerList
> and XIntegerViews, we have:
>
>    x <- IntegerList(a=1:5, b=2:-3, c=1:3)
>    v <- successiveViews(unlist(x), elementLengths(x))
>
>    > x == rev(x)
>    LogicalList of length 3
>    [["a"]] TRUE TRUE TRUE FALSE FALSE
>    [["b"]] TRUE TRUE TRUE TRUE TRUE TRUE
>    [["c"]] TRUE TRUE TRUE FALSE FALSE
>
>    > v == rev(v)
>    [1] FALSE  TRUE FALSE
>
>    > match(x, rev(x))
>    IntegerList of length 3
>    [["a"]] 1 2 3 <NA> <NA>
>    [["b"]] 1 2 3 4 5 6
>    [["c"]] 1 2 3
>
>    > match(v, rev(v))
>    Error in base::match(x, table, nomatch = nomatch, incomparables =
> incomparables,  :
>      'match' requires vector arguments
>
> This is not a good situation and there is still some work that needs to
> be done at some point in the future to clean-up the match/== methods in
> IRanges/GenomicRanges. In the mean time I think we should hold on
> adding new methods for List objects until there is a clear consensus on
> how they should behave.
>
> As for Nico's request, I agree that the best way to go would be to just
> make findOverlaps(type="equal") work. There are some subtle semantic
> differences between a *match* (as reported by match or ==), and equality
> from a range overlap point of view. The former can report equality
> for ranges on a circular sequence that are not considered equal for
> the latter.

It's the other way around sorry:

   The *latter* can report equality for ranges on a circular sequence
   that are not considered equal for the *former*.

Cheers,
H.

> Another difference is how zero-width ranges are handled.
>
> Thanks,
> H.
>
>
> On 11/21/2013 10:43 AM, Michael Lawrence wrote:
>> So I've checked into devel a match,GRangesList,GRangesList. This allows
>> findMatches() to return what you want. There is a question though before
>> this is approved: does it make sense for match() to act like findOverlaps
>> and consider each GRanges atomically (one returned index per GRanges) or
>> should match behave as it does other Lists and return an IntegerList,
>> with
>> a value per range, grouped by the top-level elements. If we decide on the
>> latter, then the method I wrote needs to be removed and the
>> implementation
>> moved to the "equals" mode in findOverlaps. Either way,
>> findOverlaps(type="equals") should be made to work.
>>
>> Michael
>>
>>
>> On Thu, Nov 21, 2013 at 8:13 AM, Nicolas Delhomme
>> <nicolas.delhomme at umu.se>wrote:
>>
>>> Thanks!
>>> ---------------------------------------------------------------
>>> Nicolas Delhomme
>>>
>>> Nathaniel Street Lab
>>> Department of Plant Physiology
>>> Umeå Plant Science Center
>>>
>>> Tel: +46 90 786 7989
>>> Email: nicolas.delhomme at plantphys.umu.se
>>> SLU - Umeå universitet
>>> Umeå S-901 87 Sweden
>>> ---------------------------------------------------------------
>>>
>>> On 21 Nov 2013, at 17:06, Michael Lawrence <lawrence.michael at gene.com>
>>> wrote:
>>>
>>>> I will work on this today.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Thu, Nov 21, 2013 at 4:43 AM, Nicolas Delhomme <
>>> nicolas.delhomme at umu.se> wrote:
>>>> Hej Bioc!
>>>>
>>>> When I try to find “equal” ranges from two GRangesList object, I get
>>>> the
>>> following error:
>>>>
>>>>> findOverlaps(query=grng.def,subject=grng.mod,type="equal")
>>>> Error in match.arg(type) :
>>>>    'arg' should be one of “any”, “start”, “end”, “within”
>>>>
>>>> Isn’t type=“equal” supported for the GRangesList, GRangesList
>>>> signature?
>>>>
>>>> Cheers,
>>>>
>>>> Nico
>>>>
>>>> sessionInfo()
>>>> R version 3.0.2 (2013-09-25)
>>>> Platform: x86_64-apple-darwin13.0.0 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>>    base
>>>>
>>>> other attached packages:
>>>>   [1] easyRNASeq_1.8.2       ShortRead_1.20.0       Rsamtools_1.14.1
>>>    GenomicRanges_1.14.3   DESeq_1.14.0           lattice_0.20-24
>>>   locfit_1.5-9.1
>>>>   [8] Biostrings_2.30.1      XVector_0.2.0          IRanges_1.20.5
>>>    edgeR_3.4.0            limma_3.18.3           biomaRt_2.18.0
>>> Biobase_2.22.0
>>>> [15] genomeIntervals_1.18.0 BiocGenerics_0.8.0     intervals_0.14.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>>   [1] annotate_1.40.0      AnnotationDbi_1.24.0 bitops_1.0-6
>>> DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0
>>> grid_3.0.2
>>>          hwriter_1.3
>>>>   [9] latticeExtra_0.6-26  LSD_2.5              RColorBrewer_1.0-5
>>> RCurl_1.95-4.1       RSQLite_0.11.4       splines_3.0.2
>>> stats4_3.0.2
>>>          survival_2.37-4
>>>> [17] tools_3.0.2          XML_3.98-1.1         xtable_1.7-1
>>> zlibbioc_1.8.0
>>>>
>>>>
>>>> ---------------------------------------------------------------
>>>> Nicolas Delhomme
>>>>
>>>> Nathaniel Street Lab
>>>> Department of Plant Physiology
>>>> Umeå Plant Science Center
>>>>
>>>> Tel: +46 90 786 7989
>>>> Email: nicolas.delhomme at plantphys.umu.se
>>>> SLU - Umeå universitet
>>>> Umeå S-901 87 Sweden
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list