[BioC] countMatches() (was: table for GenomicRanges)

Hervé Pagès hpages at fhcrc.org
Wed Jan 9 00:46:27 CET 2013


Thanks all for the feedback. Will do %over% and %within%. Hopefully we
can consider this is the end of the thread :-b  I'll just post a quick
note on Bioc-devel when this is ready.

Cheers,
H.

On 01/08/2013 03:07 PM, Michael Lawrence wrote:
> I think %over% and maybe %within% are all that's needed. Could go to
> %start% and %end%.
>
> Michael
>
>
>
>
>
> On Tue, Jan 8, 2013 at 2:59 PM, Cook, Malcolm <MEC at stowers.org
> <mailto:MEC at stowers.org>> wrote:
>
>     If we’re voting/brainstorming, I’d go for one operator for value
>     that the ‘type’ arg of overlap can take on____
>
>     __ __
>
>     Thus:____
>
>     __ __
>
>     %olStart%____
>
>     %olEnd%____
>
>     %olWithin%____
>
>     %olAny% (perhaps with alias of just ‘%ol%’)____
>
>     %olEqual% (which should be same as %in%, right)____
>
>     __ __
>
>     Doh, I can’t stay away from this issue for some reason..... Anyway,
>     my 2 cents____
>
>     __ __
>
>     ~Malcolm____
>
>     __ __
>
>     *From:*Tim Triche, Jr. [mailto:tim.triche at gmail.com
>     <mailto:tim.triche at gmail.com>]
>     *Sent:* Tuesday, January 08, 2013 4:12 PM
>     *To:* Michael Lawrence
>     *Cc:* Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
>     bioconductor at r-project.org <mailto:bioconductor at r-project.org>
>     *Subject:* Re: [BioC] countMatches() (was: table for GenomicRanges)____
>
>     __ __
>
>     Michael: your suggestion is both clearer and more concise than mine
>     was.  +1 ____
>
>     __ __
>
>     (I prefer x %i% y %i% z rather than intersect(x, intersect(y, z))
>     for the same reason)____
>
>     __ __
>
>     __ __
>
>     __ __
>
>     __ __
>
>     On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence
>     <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>     wrote:____
>
>     I would vote for %over% instead of %ov%. Just 2 more characters but
>     way clearer, at least to me. The hardest thing to type are the %'s.
>
>     Michael____
>
>     __ __
>
>     On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages at fhcrc.org
>     <mailto:hpages at fhcrc.org>> wrote:____
>
>         Thanks Tim, Malcolm for the feedback.
>
>         @Tim, I won't comment on the variants of %ov% you are proposing for
>         doing "within" or "equal" instead of "any" (but if people want them,
>         I'll add them too). For now I just want to focus on restoring the
>         convenience of the old %in%, whose removal is understandably causing
>         some frustration. And so we can move on.
>
>         Cheers,
>         H.____
>
>
>
>
>         On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:____
>
>             hell, I'll add the operators if there's support for them.
>               obviously
>             they're not a big deal and a patch would take 5 minutes flat.
>
>             my hope was to be very explicit about what each type of
>             operation meant,
>             so that when a newcomer to the Ranges API sees
>
>                 peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
>
>             it cannot be confused with
>
>                 peaks %within% rangesThatCorrespondToSomeChromatinState
>
>             or
>
>                 peaks %equal% aBunchOfDNAseFootprints
>
>             or
>
>                 DMRs %in% genes  ## what the hell does this really mean,
>             anyways?
>                it's so bad on so many levels
>
>             because whenever someone says "what is the advantage of
>             Ranges-based
>             analyses?", these are the archetypal sorts of queries that
>             come to mind.
>                Except that usually in my examples they are based on
>             posterior
>             probabilities, but perhaps that could stand to change.
>
>             Anyways, that's just my bias, and you're doing the heavy
>             lifting.  But
>             if people agree with the motivations I will write the patch
>             today.
>
>             Cheers,
>
>             --t
>
>
>
>
>             On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès
>             <hpages at fhcrc.org <mailto:hpages at fhcrc.org>____
>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>                  Hi Tim,
>
>                  I could add the %ov% operator as a replacement for the
>             old %in%. So you
>                  would write 'peaks %ov% genes' instead of 'peaks %in%
>             genes'. Would just
>                  be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>
>                  Cheers,
>                  H.
>
>
>                  On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
>                      So why not leave %in% as it was and transition
>             everything forward to
>                      explicitly using {  `%within%`,
>             `%overlaps%`|`%overlapping%`,
>                      `%equals%`
>                      } such that
>
>                          identical( x %within% table, countOverlaps(x,
>             table,
>                      type='within') >
>                      0 ) == TRUE
>                          identical( x %overlaps% table, countOverlaps(x,
>             table,
>                      type='any') >
>                      0 ) == TRUE
>                          identical( x %equals% table, countOverlaps(x,
>             table,
>                      type='equal') >
>                      0 ) == TRUE
>
>                      and for the time being,
>
>                          identical( x %overlaps% table, countOverlaps(x,
>             table,
>                      type='any') >
>                      0 ) == TRUE ## but with a noisy nastygram that will
>             halt if
>                      options("warn"=2)
>                      No breakage for %in% methods until such time as a full
>                      deprecation cycle
>                      has passed, and if the maintainers can't be arsed
>             to do anything
>                      at all
>                      about the warnings by the second full release, then
>             perhaps they
>                      don't
>                      really care that much after all.  Just a thought?
>
>                        From someone (me) who has their own issues with
>             keeping
>                      everything up
>                      to date and should know better.  If you want to use
>             %in% for
>
>                          peaks %in% genes (why on earth would you do
>             this rather than
>                      peaks
>                      %in% promoters(genes), anyways?)
>
>                      then a nastygram could be emitted "WARNING: YOUR
>             SHORTHAND
>                      NOTATION IS
>                      DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED"
>             and everyone is
>                      (more
>                      or less) happy.
>
>
>
>                      On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
>                      <lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>
>             <mailto:lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>>____
>
>                      <mailto:lawrence.michael at gene.
>             <mailto:lawrence.michael at gene.>__com
>                      <mailto:lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>>>> wrote:
>
>
>
>             ____
>
>                           On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
>                      <hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>____
>
>                           <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>>> wrote:
>
>                               Hi Michael,
>
>                               I don't think "match" (the word) always
>             has to mean
>                      "equality"
>                               either.
>                               However having match() (the function) do
>             "whole exact
>                      matching" (aka
>                               "equality") for any kind of vector-like
>             object has the
>                      advantage of:
>
>                                  (a) making it consistent with base::match()
>                      (?base::match is
>                               pretty
>                                      explicit about what the contract of
>             match() is)
>
>
>                           (a) alone is obviously not enough.  We have
>             many methods,
>                      like the
>                           set operations, that treat ranges specially.
>               Are we going
>                      to start
>                           moving everything toward the base behavior?
>             And have
>                      rangeIntersect,
>                           rangeSetdiff, etc?
>
>                                  (b) preserving its relationship with ==,
>                      duplicated(), unique(),
>                                      etc...
>
>
>                           So it becomes consistent with
>             duplicated/unique, but we lose
>                           consistency with the set operations.
>
>                                  (c) not frustrating the user who needs
>             something to
>                      do exact
>                                      matching on ranges (as I mentioned
>             previously,
>                      if you take
>                                      match() away from him/her, s/he'll
>             be left with
>                      nothing).
>
>
>                           No one has ever asked for match() to behave
>             this way. There
>                      was a
>                           request for a way to tabulate identical
>             ranges. It was a
>                      nice idea
>                           to extract the general "outer equal"
>             findMatches function.
>                      But the
>                           changes seem to be snow-balling.  These types
>             of changes
>                      mean a lot
>                           of maintenance work for the users. A
>             deprecation cycle does not
>                           circumvent that.
>
>
>                               IMO those advantages counterbalance *by
>             far* the very
>                      little
>                               convenience you get from having
>             'match(query, subject)' do
>                               'findOverlaps(query, subject,
>             select="first")' on
>                               IRanges/GRanges objects. If you need to do
>             that, just
>                      use the
>                               latter, or, if you think that's still too
>             much typing,
>                      define
>                               a wrapper e.g. 'ovmatch(query, subject)'.
>
>                               There are plenty of specialized tools
>             around for doing
>                               inexact/fuzzy/partial/overlap matching for
>             many
>                      particular types
>                               of vector-like objects: grep() and family,
>             pmatch(),
>                      charmatch(),
>                               agrep(), grepRaw(), matchPattern() and family,
>                      findOverlaps() and
>                               family, findIntervals(), etc... For the
>             reasons I mentioned
>                               above, none of them should hijack match()
>             to make it do
>                      some
>                               particular type of inexact matching on
>             some particular
>                      type of
>                               objects. Even if, for that particular type
>             of objects,
>                      doing that
>                               particular type of inexact matching is
>             more common than
>                      doing
>                               exact matching.
>
>                               H.
>
>
>
>                               On 01/06/2013 05:39 PM, Michael Lawrence
>             wrote:
>
>                                   I think having overlapsAny is a nice
>             addition and
>                      helps make
>                                   the API
>                                   more complete and explicit. Are you
>             sure we need to
>                      change
>                                   the behavior
>                                   of the match method for this
>             relatively uncommon
>                      use case?
>
>
>                               Yes because otherwise users with a use
>             case of doing
>                      match()
>
>                               even if it's uncommon,
>
>
>                                   I don't think
>                                   "match" always has to mean "equality".
>             It is a more
>                      general
>                                   concept in
>                                   my mind. The most common use case for
>             matching
>                      ranges is
>                                   overlap.
>
>
>                               Of course "match" doesn't always have to
>             mean equality.
>                      But of base
>
>
>                                   Michael
>
>
>                                   On Fri, Jan 4, 2013 at 8:34 PM, Hervé
>             Pagès
>                                   <hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>____
>
>                                   <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>>
>             wrote:____
>
>                                        Yes 'peaks %in% genes' is cute
>             and was
>                      probably doing
>                                   the right thing
>                                        for most users (although not
>             all). But 'exons %in%
>                                   genes' is cute too
>                                         and was probably doing the wrong
>             thing for
>                      all users.
>                                   Advanced users
>                                        like you guys would have no
>             problem switching to
>
>                                           !is.na <http://is.na>
>             <http://is.na> <http://is.na>____
>
>                                   <http://is.na>(findOverlaps(____peaks,
>             genes,____
>
>
>                      type="within",
>
>                                        select="any"))
>
>                                        or
>
>                                           !is.na <http://is.na>
>             <http://is.na> <http://is.na>____
>
>                                   <http://is.na>(findOverlaps(____peaks,
>             genes,____
>
>
>                      type="equal",
>
>
>                                        select="any"))
>
>                                        in case 'peaks %in% genes' was
>             not doing
>                      exactly what
>                                   you wanted,
>                                        but most users would not find
>             this particularly
>                                   friendly. Even
>                                        worse, some users probably didn't
>             realize that
>                      'peaks
>                                   %in% genes'
>                                        was not doing exactly what they
>             thought it did
>                      because
>                                   "peaks in
>                                        genes" in English suggests that
>             the peaks are
>                      within
>                                   the genes,
>                                        but it's not what 'peaks %in%
>             genes' does.
>
>                                        Having overlapsAny(), with
>             exactly the same extra
>                                   arguments as
>                                        countOverlaps() and
>             subsetByOverlaps() (i.e.
>                      'maxgap',
>                                   'minoverlap',
>                                        'type', 'ignore.strand'), all of them
>                      documented (and
>                                   with most
>                                        users more or less familiar with
>             them already)
>                      has the
>                                   virtue to
>                                        expose the user to all the
>             options from the
>                      very start,
>                                   and to
>                                        help him/her make the right
>             choice. Of course
>                      there
>                                   will be users
>                                        that don't want or don't have the
>             time to
>                      read/think
>                                   about all the
>                                        options. Not a big deal: they'll
>             just do
>                                   'overlapsAny(query, subject)',
>                                        which is not a lot more typing
>             than 'query %in%
>                                   subject', especially
>                                        if they use tab completion.
>
>                                        It's true that it's more common
>             to ask
>                      questions about
>                                   overlap than
>                                        about equality but there are some
>             use cases
>                      for the
>                                   latter (as the
>                                        original thread shows). Until
>             now, when you
>                      had such a
>                                   use case, you
>                                        could not use match() or %in%,
>             which would
>                      have been
>                                   the natural things
>                                        to use, because they got hijacked
>             to do
>                      something else,
>                                   and you were
>                                        left with nothing. Not a
>             satisfying situation.
>                      So at a
>                                   minimum, we
>                                        needed to restore the
>             true/real/original
>                      semantic of
>                                   match() to do
>                                        "equality" instead of "overlap".
>             But it's hard
>                      to do
>                                   this for match()
>                                        and not do it for %in% too. For
>             more than 99% of R
>                                   users, %in% is
>                                        just a simple wrapper for
>             'match(x, table,
>                      nomatch = 0)
>                                    > 0' (this
>                                        is how it has been documented and
>             implemented
>                      in base R
>                                   for many
>                                        years). Not maintaining this
>             relationship
>                      between %in%
>                                   and match()
>                                        would only cause grief and
>             frustration to
>                      newcomers to
>                                   Bioconductor.
>
>                                        H.
>
>
>
>                                        On 01/04/2013 03:32 PM, Cook,
>             Malcolm wrote:
>
>                                            Hiya again,
>
>                                            I am definitely a late comer
>             to BioC, so I
>                                   definitely easily
>                                            defer to
>                                            the tide of history.
>
>                                            But I do think you miss my
>             point Michael
>                      about the
>                                   proposed change
>                                            making the relationship
>             between %in% and
>                      match for
>                                   {G,I}Ranges{List}
>                                            mimic that between other
>             vectors, and I do
>                      think
>                                   that changing
>                                            the API
>                                            would make other late-comers
>             take to BioC
>                                   easier/faster.
>
>                                            That said, I NEVER use %in%
>             so I really
>                      have no
>                                   stake in the
>                                            matter, and
>                                            I DEFINITELY appreciate the
>             argument to not
>                                   changing the API
>                                            just for
>                                            sematic sweetness.
>
>                                            That that said, Herve is _/so
>             good/_ about
>                                   deprecations and warnings
>
>                                            that make such changes fairly
>             easily
>                      digestible.
>
>                                            That that that.... enough....
>             I bow out of
>                      this
>                                   one....!!!!
>
>                                            Always learning and Happy New
>             Year to all
>                      lurkers,
>
>                                            ~Malcolm
>
>                                            *From:*Michael Lawrence____
>
>                                   [mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>
>                      <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>>.
>                                   <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>
>                      <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>>.__>____com
>
>             ____
>
>                                            <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>.
>                      <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>.>____com
>                                   <mailto:lawrence.michael at gene.
>             <mailto:lawrence.michael at gene.>__com
>                      <mailto:lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>>>>]
>                                            *Sent:* Friday, January 04,
>             2013 5:11 PM
>                                            *To:* Cook, Malcolm
>                                            *Cc:* Sean Davis; Michael
>             Lawrence; Hervé
>                      Pagès
>                                            (hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>>____
>
>                                   <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>____
>
>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>); Tim
>
>
>
>                                            Triche, Jr.; Vedran Franke;
>             bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>
>             <mailto:bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>>
>                      <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>
>                      <mailto:bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>>>____
>
>
>             <mailto:bioconductor at r-____project.org
>             <mailto:bioconductor at r-____project.org>
>                      <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>>
>
>                                   <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>
>                      <mailto:bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>>>>____
>
>                                            *Subject:* Re: [BioC]
>             countMatches() (was:
>                      table
>                                   for GenomicRanges)
>
>
>                                            On Fri, Jan 4, 2013 at 1:56
>             PM, Cook, Malcolm
>                                   <MEC at stowers.org
>             <mailto:MEC at stowers.org> <mailto:MEC at stowers.org
>             <mailto:MEC at stowers.org>>
>                      <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
>             <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>
>                                            <mailto:MEC at stowers.org
>             <mailto:MEC at stowers.org>
>                      <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
>             <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
>                      <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>>
>                                            <mailto:MEC at stowers.org
>             <mailto:MEC at stowers.org>
>                      <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
>             <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
>                      <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>
>                                   <mailto:MEC at stowers.org
>             <mailto:MEC at stowers.org> <mailto:MEC at stowers.org
>             <mailto:MEC at stowers.org>>
>                      <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
>             <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>>>> wrote:
>
>                                            Hiya,
>
>                                            For what it is worth...
>
>                                            I think the change to %in% is
>             warranted.
>
>                                            If I understand correctly,
>             this change
>                      restores the
>                                   relationship
>                                            between
>                                            the semantics of `%in` and
>             the semantics
>                      of `match`.
>
>                                              From the docs:
>
>                                                '"%in%" <- function(x,
>             table) match(x,
>                      table,
>                                   nomatch = 0) > 0'
>
>                                            Herve's change restores this
>             relationship.
>
>
>                                            match and %in% were initially
>             consistent (both
>                                   considering any
>                                            overlap);
>                                            Herve has changed both of
>             them together.
>                      The whole
>                                   idea behind
>                                            IRanges
>                                            is that ranges are special
>             data types with
>                      special
>                                   semantics. We
>                                            have
>                                            reimplemented much of the
>             existing R
>                      vector API
>                                   using those
>                                            semantics;
>                                            this extends beyond
>             match/%in%. I am
>                      hesitant about
>                                   making such
>                                            sweeping
>                                            changes to the API so late in the
>                      life-cycle of the
>                                   package.
>                                            There was a
>                                            feature request for a way to
>             count
>                      identical ranges
>                                   in a set of
>                                            ranges.
>                                            Let's please not get carried
>             away and start
>                                   redesigning the API
>                                            for this
>                                            one, albeit useful, request.
>             There are all
>                      sorts of
>                                            inconsistencies in
>                                            the API, and many of them
>             were conscious
>                      decisions
>                                   that considered
>                                            practical use cases.
>
>                                            Michael
>
>
>                                                 Herve, I suspect you
>             were you as a
>                      result able to
>                                            completely drop
>                                                 all the
>             `%in%,BiocClass1,BiocClass2`
>                                   definitions and depend
>                                            upon
>                                                 base::%in%
>
>                                                 Am I right?
>
>                                                 If so, may I suggest
>             that Herve stay the
>                                   course, with the
>                                            addition of
>                                                    '"%ol%" <- function(a, b)
>                      findOverlaps(a,
>                                   b, maxgap=0L,
>                                                 minoverlap=1L, type='any',
>                      select='all') > 0'
>
>                                                 This would provide a
>             perspicacious
>                      idiom, thereby
>                                            optimizing the API
>                                                 for Michaels observed
>             common use case.
>
>                                                 Just sayin'
>
>                                                 ~Malcolm
>
>
>                                                   .-----Original
>             Message-----
>                                                   .From:____
>
>             bioconductor-bounces at r-______project.org
>             <mailto:bioconductor-bounces at r-______project.org>
>                      <mailto:bioconductor-bounces at r-____project.org
>             <mailto:bioconductor-bounces at r-____project.org>>
>                                   <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>__r-__project.org
>             <http://r-__project.org>
>                      <mailto:bioconductor-bounces at r-__project.org
>             <mailto:bioconductor-bounces at r-__project.org>>>____
>
>                                            <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>____r-project.org
>             <http://r-project.org>
>                      <http://r-project.org>
>                                   <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>__r-project.org
>             <http://r-project.org>
>                      <mailto:bioconductor-bounces at r-project.org
>             <mailto:bioconductor-bounces at r-project.org>>>>
>
>
>               <mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>____
>
>                                   <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>>______r-project.org
>             <http://r-project.org>
>                      <http://r-project.org>
>                                   <http://r-project.org>____
>
>                                            <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>____r-project.org
>             <http://r-project.org>
>                      <http://r-project.org>
>                                   <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>__r-project.org
>             <http://r-project.org>
>                      <mailto:bioconductor-bounces at r-project.org
>             <mailto:bioconductor-bounces at r-project.org>>>>>
>
>               [mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>____
>
>                                   <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>>______r-project.org
>             <http://r-project.org>
>                      <http://r-project.org>
>                                   <http://r-project.org>____
>
>                                            <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>____r-project.org
>             <http://r-project.org>
>                      <http://r-project.org>
>                                   <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>__r-project.org
>             <http://r-project.org>
>                      <mailto:bioconductor-bounces at r-project.org
>             <mailto:bioconductor-bounces at r-project.org>>>>
>
>
>               <mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>____
>
>                                   <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>>______r-project.org
>             <http://r-project.org>
>                      <http://r-project.org>
>                                   <http://r-project.org>____
>
>                                            <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>
>                      <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>>____r-project.org
>             <http://r-project.org>
>                      <http://r-project.org>
>                                   <mailto:bioconductor-bounces@
>             <mailto:bioconductor-bounces@>__r-project.org
>             <http://r-project.org>
>                      <mailto:bioconductor-bounces at r-project.org
>             <mailto:bioconductor-bounces at r-project.org>>>>>] On Behalf
>             Of Sean
>                                            Davis
>                                                   .Sent: Friday, January
>             04, 2013 3:37 PM
>                                                   .To: Michael Lawrence
>                                                   .Cc: Tim Triche, Jr.;
>             Vedran Franke;
>             bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>
>             <mailto:bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>>
>                                   <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>
>                      <mailto:bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>>>
>                                   <mailto:bioconductor at r-____project.org
>             <mailto:bioconductor at r-____project.org>
>                      <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>>
>                                   <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>
>                      <mailto:bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>>>>____
>
>                      <mailto:bioconductor at r-______project.org
>             <mailto:bioconductor at r-______project.org>
>                      <mailto:bioconductor at r-____project.org
>             <mailto:bioconductor at r-____project.org>>____
>
>
>                                   <mailto:bioconductor at r-____project.org
>             <mailto:bioconductor at r-____project.org>
>                      <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>>>
>
>
>
>             <mailto:bioconductor at r-____project.org
>             <mailto:bioconductor at r-____project.org>
>                      <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>>
>                                   <mailto:bioconductor at r-__project.org
>             <mailto:bioconductor at r-__project.org>
>                      <mailto:bioconductor at r-project.org
>             <mailto:bioconductor at r-project.org>>>>>
>
>                                                   .Subject: Re: [BioC]
>             countMatches()
>                      (was:
>                                   table for
>                                            GenomicRanges)
>                                                   .
>                                                   .On Fri, Jan 4, 2013
>             at 4:32 PM,
>                      Michael
>                                   Lawrence
>
>               .<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
>                      <mailto:lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>>
>                                   <mailto:lawrence.michael at gene.
>             <mailto:lawrence.michael at gene.>__com
>                      <mailto:lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>>>
>                                            <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>.
>                      <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>.>____com
>                                   <mailto:lawrence.michael at gene.
>             <mailto:lawrence.michael at gene.>__com
>                      <mailto:lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>>>>____
>
>                                            <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>
>                      <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>>.
>                                   <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>
>                      <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>>.__>____com____
>
>                                            <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>.
>                      <mailto:lawrence.michael at gene
>             <mailto:lawrence.michael at gene>.>____com
>                                   <mailto:lawrence.michael at gene.
>             <mailto:lawrence.michael at gene.>__com
>                      <mailto:lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>>>>>> wrote:
>                                                   .> The change to the
>             behavior of
>                      %in% is a
>                                   pretty big
>                                            one. Are you
>                                                 thinking
>                                                   .> that all set-based
>             operations should
>                                   behave this way? For
>                                                 example, setdiff
>                                                   .> and intersect? I
>             really liked
>                      the syntax
>                                   of "peaks
>                                            %in% genes".
>                                                 In my
>                                                   .> experience, it's
>             way more common
>                      to ask
>                                   questions
>                                            about overlap
>                                                 than about
>                                                   .> equality, so I'd
>             rather optimize
>                      the API
>                                   for that use
>                                            case. But
>                                                 again,
>                                                   .> that's just my
>             personal bias.
>                                                   .
>                                                   .For what it is worth,
>             I share
>                      Michael's
>                                   personal bias here.
>                                                   .
>                                                   .Sean
>                                                   .
>                                                   .
>                                                   .> Michael
>                                                   .>
>                                                   .>
>                                                   .> On Fri, Jan 4, 2013
>             at 1:11 PM,
>                      Hervé Pagès
>                                            <hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>                                   <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>
>                                                 <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                                   <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                                   <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>>>>> wrote:
>                                                   .>
>                                                   .>> Hi,
>                                                   .>>
>                                                   .>> I added
>             findMatches() and
>                      countMatches()
>                                   to the
>                                            latest IRanges /
>                                                   .>> GenomicRanges
>             packages (in BioC
>                      devel only).
>                                                   .>>
>                                                   .>>   findMatches(x,
>             table): An
>                      enhanced
>                                   version of
>                                            ‘match’ that
>                                                   .>>           returns
>             all the
>                      matches in a
>                                   Hits object.
>                                                   .>>
>                                                   .>>   countMatches(x,
>             table):
>                      Returns an
>                                   integer vector
>                                            of the length
>                                                   .>>           of ‘x’,
>             containing
>                      the number
>                                   of matches in
>                                            ‘table’ for
>                                                   .>>           each
>             element in ‘x’.
>                                                   .>>
>
>                                                   .>> countMatches() is
>             what you can
>                      use to
>                                            tally/count/tabulate
>                                                 (choose your
>
>                                                   .>> preferred term)
>             the unique
>                      elements in a
>                                   GRanges object:
>                                                   .>>
>                                                   .>>
>             library(GenomicRanges)
>                                                   .>>   set.seed(33)
>                                                   .>>   gr <-
>             GRanges("chr1",____
>
>
>             IRanges(sample(15,20,replace=*______*TRUE),____
>
>
>
>
>                                                 width=5))
>                                                   .>>
>                                                   .>> Then:
>                                                   .>>
>                                                   .>>   > gr_levels <-
>             sort(unique(gr))
>                                                   .>>   >
>             countMatches(gr_levels, gr)
>                                                   .>>    [1] 1 1 1 2 4 2
>             2 1 2 2 2
>                                                   .>>
>                                                   .>> Note that
>             findMatches() and
>                                   countMatches() also work on
>                                                 IRanges and
>                                                   .>> DNAStringSet
>             objects, as well as on
>                                   ordinary atomic
>                                            vectors:
>                                                   .>>
>                                                   .>>
>             library(hgu95av2probe)
>                                                   .>>   library(Biostrings)
>                                                   .>>   probes <-
>                      DNAStringSet(hgu95av2probe)
>                                                   .>>   unique_probes <-
>             unique(probes)
>                                                   .>>   count <-
>                      countMatches(unique_probes,
>                                   probes)
>                                                   .>>   max(count)  # 7
>                                                   .>>
>                                                   .>> I made other
>             changes in
>                                   IRanges/GenomicRanges so that
>                                            the notion
>                                                   .>> of "match" between
>             elements of a
>                                   vector-like object now
>                                                 consistently
>                                                   .>> means "equality"
>             instead of
>                      "overlap",
>                                   even for
>                                            range-based
>                                                 objects
>                                                   .>> like IRanges or
>             GRanges
>                      objects. This
>                                   notion of
>                                            "equality" is the
>                                                   .>> same that is used
>             by ==. The most
>                                   visible consequence
>                                            of those
>                                                   .>> changes is that
>             using %in%
>                      between 2
>                                   IRanges or
>                                            GRanges objects
>                                                   .>> 'query' and
>             'subject' in order
>                      to do
>                                   overlaps was
>                                            replaced by
>                                                   .>> overlapsAny(query,
>             subject).
>                                                   .>>
>                                                   .>>
>             overlapsAny(query, subject):
>                      Finds the
>                                   ranges in
>                                            ‘query’ that
>                                                   .>>      overlap any
>             of the ranges
>                      in ‘subject’.
>                                                   .>>
>
>                                                   .>> There are warnings
>             and deprecation
>                                   messages in place
>                                            to help
>                                                 smooth
>
>                                                   .>> the transition.
>                                                   .>>
>                                                   .>> Cheers,
>                                                   .>> H.
>                                                   .>>
>                                                   .>> --
>                                                   .>> Hervé Pagès
>                                                   .>>
>                                                   .>> Program in
>             Computational Biology
>                                                   .>> Division of Public
>             Health Sciences
>                                                   .>> Fred Hutchinson
>             Cancer Research
>                      Center
>                                                   .>> 1100 Fairview Ave.
>             N, M1-B514
>                                                   .>> P.O. Box 19024
>                                                   .>> Seattle, WA 98109-1024
>                                                   .>>
>                                                   .>> E-mail:
>             hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                                   <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                                   <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>>>____
>
>                                            <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
>                                   <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>             <mailto:hpages at fhcrc.org>>____
>
>                      <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>             <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>>
>
>                                                   .>> Phone: (206)
>             667-5791 <tel:%28206%29%20667-5791>
>                      <tel:%28206%29%20667-5791>
>                                   <tel:%28206%29%20667-5791>
>             <tel:%28206%29%20667-5791>
>                                            <tel:%28206%29%20667-5791>
>                                                   .>> Fax: (206)
>             667-1319 <tel:%28206%29%20667-1319>
>                      <tel:%28206%29%20667-1319>
>                                   <tel:%28206%29%20667-1319>
>             <tel:%28206%29%20667-1319>
>                                            <tel:%28206%29%20667-1319>
>
>                                                   .>>
>                                                   .>
>                                                   .>
>             [[alternative HTML
>                      version deleted]]
>                                                   .>
>                                                   .>
>                                                   .>____
>
>
>               _________________________________________________________
>
>
>
>
>                                                   .> Bioconductor
>             mailing list
>                                                   .>
>             Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>                      <mailto:Bioconductor at r-project.org
>             <mailto:Bioconductor at r-project.org>>
>                                   <mailto:Bioconductor at r-__project.org
>             <mailto:Bioconductor at r-__project.org>
>                      <mailto:Bioconductor at r-project.org
>             <mailto:Bioconductor at r-project.org>>>
>
>             <mailto:Bioconductor at r-____project.org
>             <mailto:Bioconductor at r-____project.org>
>                      <mailto:Bioconductor at r-__project.org
>             <mailto:Bioconductor at r-__project.org>>
>                                   <mailto:Bioconductor at r-__project.org
>             <mailto:Bioconductor at r-__project.org>
>                      <mailto:Bioconductor at r-project.org
>             <mailto:Bioconductor at r-project.org>>>>____
>
>
>             <mailto:Bioconductor at r-______project.org
>             <mailto:Bioconductor at r-______project.org>
>                      <mailto:Bioconductor at r-____project.org
>             <mailto:Bioconductor at r-____project.org>>____
>
>
>                                   <mailto:Bioconductor at r-____project.org
>             <mailto:Bioconductor at r-____project.org>
>                      <mailto:Bioconductor at r-__project.org
>             <mailto:Bioconductor at r-__project.org>>>
>
>
>             <mailto:Bioconductor at r-____project.org
>             <mailto:Bioconductor at r-____project.org>
>                      <mailto:Bioconductor at r-__project.org
>             <mailto:Bioconductor at r-__project.org>>
>                                   <mailto:Bioconductor at r-__project.org
>             <mailto:Bioconductor at r-__project.org>
>                      <mailto:Bioconductor at r-project.org
>             <mailto:Bioconductor at r-project.org>>>>>
>
>                                                   .>____
>
>             https://stat.ethz.ch/mailman/______listinfo/bioconductor
>
>             <https://stat.ethz.ch/mailman/____listinfo/bioconductor>____
>
>
>
>                      <https://stat.ethz.ch/mailman/____listinfo/bioconductor
>                      <https://stat.ethz.ch/mailman/__listinfo/bioconductor>>
>
>
>
>
>                      <https://stat.ethz.ch/mailman/____listinfo/bioconductor
>                      <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
>                      <https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                      <https://stat.ethz.ch/mailman/listinfo/bioconductor>>>
>                                                   .> Search the
>             archives:____
>
>             <http://news.gmane.org/gmane.______science.biology.informatics.______conductor>
>
>     ...
>
>     [Message clipped]
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list