[BioC] GRanges setdiff With Features on Both Strands

Hervé Pagès hpages at fhcrc.org
Mon Jan 23 21:32:38 CET 2012


Hi Michael, Malcolm,

On 01/23/2012 11:14 AM, Michael Lawrence wrote:
>
>
> On Mon, Jan 23, 2012 at 7:12 AM, Cook, Malcolm <MEC at stowers.org
> <mailto:MEC at stowers.org>> wrote:
>
>      > > As for the weird behavior you get with ignore.strand=TRUE, I agree
>      > > it doesn't make much sense and it's not totally clear to me
>     what the
>      > > correct behavior should be. Sadly the man page doesn't say anything
>      > > about what the intended behavior is (there is only a brief note
>     about
>      > > what ignore.strand=TRUE does for parallel set operations), gives no
>      > > example, and this case is not covered by the unit tests either.
>      > >
>      > > However, by looking at what union() and gaps() do with
>      > > ignore.strand=TRUE, and also at the internals of the GenomicRanges
>      > > package, it seems to me that the intention was that:
>      > >
>      > >  union(x, y, ignore.strand=TRUE)
>      > >  intersect(x, y, ignore.strand=TRUE)
>      > >  setdiff(x, y, ignore.strand=TRUE)
>      > >
>      > > would be equivalent to:
>      > >
>      > >  union(`strand<-`(x, "+"), `strand<-`(y, "+"))
>      > >  intersect(`strand<-`(x, "+"), `strand<-`(y, "+"))
>      > >  setdiff(`strand<-`(x, "+"), `strand<-`(y, "+"))
>      > >
>      > > So I could fix setdiff() and intersect() to do this
>     (intersect() has
>      > > a weird behavior too), and update the documentation. Does that
>     sound
>      > > reasonable?
>      > >
>      > >
>      > This would be reasonable to me.
>
>     I think that retaining `ignore.strand` with the proposed semantics
>     of (potentially) altering the strand is inviting further confusion.
>
>     For example, it would introduce the possibility that the union of x
>     with itself is not all.equal to itself only in the case when the
>     strand of x is not '+'.
>
>     I think this is a markedly poor outcome since `ignore.strand` is
>     clearly NOT what the operation would be doing.
>
>
> I would expect that a set operation ignoring strand would generate
> ranges with "*" as the strand. There's no way to reliably preserve the
> strand. So I don't see how the original strand would matter.

I think that Malcom's point is that it's an unpleasant surprise that
some very fundamental and intuitive mathematical properties are not
honored, e.g. the union (or intersection) of x with itself is not
itself. I can live with that one though: this happens only if you
use ignore.strand=TRUE, which is not the default. And we still have
the property that the union (or intersection) of x with itself is
itself *modulo* the strand. Probably good enough.

Now whether

   union(x, y, ignore.strand=TRUE)
   intersect(x, y, ignore.strand=TRUE)
   setdiff(x, y, ignore.strand=TRUE)

should be equivalent to

   union(`strand<-`(x, "+"), `strand<-`(y, "+"))
   intersect(`strand<-`(x, "+"), `strand<-`(y, "+"))
   setdiff(`strand<-`(x, "+"), `strand<-`(y, "+"))

or to

   union(`strand<-`(x, "*"), `strand<-`(y, "*"))
   intersect(`strand<-`(x, "*"), `strand<-`(y, "*"))
   setdiff(`strand<-`(x, "*"), `strand<-`(y, "*"))

is just a cosmetic question but I agree that the output of the latter
will look prettier (and might be slightly less confusing). So I can do
this. Does anybody object?

>
>      > > Finally I'd like to mention that I see very little value in
>     supporting
>      > > the 'ignore.strand' arg in those 3 methods so another option
>     would be
>      > > to just get rid of it.
>      > >
>      > >
>      > I think there is value here. Often, one wants to ignore the
>     strand in these
>      > operations (say between reads and transcripts) without destroying the
>      > strand information. Creating temporary objects is a nuisance.
>
>     Perhaps a compromise then.....
>
>     Since the implementation appears to have been undocumented and the
>     intention unclear, how about replace `ignore.strand` option with a
>     `with.strand=c('+','-','*')` option which sets the GRanges strand
>     prior to performing the operation.
>
>
>
> I would prefer keeping consistent with the rest of the API and having
> ignore.strand simply be the equivalent of with.strand="*".  I
> acknowledge that the meaning of "*" is ambiguous, but it's easily
> preferred over "+" and "-".
>

Yes a lot of methods already have the 'ignore.strand' arg so if we're
going to keep that feature for union(), intersect() and setdiff(), I'd
rather stick to that API too.

Cheers,
H.



More information about the Bioconductor mailing list