[BioC] bug in pintersect in GenomicRanges

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Mon Nov 12 23:23:18 CET 2012


On Mon, Nov 12, 2012 at 4:28 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
>
>
> On Mon, Nov 12, 2012 at 11:00 AM, Kasper Daniel Hansen
> <kasperdanielhansen at gmail.com> wrote:
>>
>> Hmm, ok.
>>
>> Well, an alternative is to let the output of pintersect be shorter
>> than the two inputs.
>
>
> I would really prefer not to violate that contract of the p* functions.  If
> there are any empty ranges, it's not easy to figure out how the results
> match up with the input.

I can see this point of view.  But one could use findOverlaps. I am
just advocating for my position which is (1) inconsistency with
intersect() is not great and (2) I believe I could easily be caught by
not noticing a zero-width IRanges.

Anyway, reading ?pintersect very carefully, it does discuss these
things in some detail, although the paragraph is not consistent with
itself (as I read it, it start by saying that an error is thrown if an
empty intersect is being produced, but then it contradicts itself
later on).

Reflecting a bit, I guess I would like a new option for resolve.empty="remove"

Kasper



>
> Michael
>
>>
>>  Also, compare to
>>   intersect(gr1[2], gr2)
>> which is empty.
>>
>> Kasper
>>
>>
>>
>> On Mon, Nov 12, 2012 at 1:46 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>> > The end = start - 1 encoding for zero-width Ranges objects has existed
>> > as
>> > long as the IRanges package. Having end < start is obviously not ideal,
>> > but
>> > there are no clearly superior alternatives.
>> >
>> > Michael
>> >
>> >
>> > On Mon, Nov 12, 2012 at 8:58 AM, Kasper Daniel Hansen
>> > <kasperdanielhansen at gmail.com> wrote:
>> >>
>> >> Well, I am surprised by the zero width IRanges with start and end.
>> >> The choice of start = 5 and end = 4 is - as far as I can see -
>> >> completely arbitrary.  I also don't like the fact that start() and
>> >> end() just reports 5 and 4 - I could see myself making mistakes in my
>> >> code, because I believe I assume start <= end.
>> >>
>> >> But perhaps this is because it is now allowed to have zero-width
>> >> IRanges?
>> >>
>> >> Kasper
>> >>
>> >>
>> >> On Mon, Nov 12, 2012 at 6:20 AM, Vincent Carey
>> >> <stvjc at channing.harvard.edu> wrote:
>> >> > does this contradict the doc in some way?  the second range of your
>> >> > result
>> >> > has width zero, which i think is correct.
>> >> >
>> >> > On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen
>> >> > <kasperdanielhansen at gmail.com> wrote:
>> >> >>
>> >> >> Example
>> >> >>
>> >> >> library(GenomicRanges)
>> >> >> gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start =
>> >> >> c(1,5), width = 7))
>> >> >> gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end =
>> >> >> 4))
>> >> >> pintersect(gr1, rep(gr2, 2))
>> >> >>
>> >> >> GRanges with 2 ranges and 0 metadata columns:
>> >> >>       seqnames    ranges strand
>> >> >>          <Rle> <IRanges>  <Rle>
>> >> >>   [1]     chr1    [1, 4]      *
>> >> >>   [2]     chr1    [5, 4]      *
>> >> >>   ---
>> >> >>   seqlengths:
>> >> >>    chr1
>> >> >>      NA
>> >> >>
>> >> >> The second element in the return GRanges should be empty.  I would
>> >> >> therefore expect that the return length of the pintersect would have
>> >> >> length 1, despite the fact that the first object has length 2.
>> >> >>
>> >> >> Kasper
>> >> >>
>> >> >> _______________________________________________
>> >> >> Bioconductor mailing list
>> >> >> Bioconductor at r-project.org
>> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> >> Search the archives:
>> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >> >
>> >> >
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>
>



More information about the Bioconductor mailing list