[BioC] countMatches() (was: table for GenomicRanges)

Cook, Malcolm MEC at stowers.org
Fri Jan 4 22:56:27 CET 2013


Hiya,

For what it is worth...

I think the change to %in% is warranted.

If I understand correctly, this change restores the relationship between the semantics of `%in` and the semantics of `match`.  

From the docs:

  '"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'

Herve's change restores this relationship.

Herve, I suspect you were you as a result able to completely drop all the `%in%,BiocClass1,BiocClass2` definitions and depend upon base::%in%

Am I right?

If so, may I suggest that Herve stay the course, with the addition of 
  '"%ol%" <- function(a, b) findOverlaps(a, b, maxgap=0L, minoverlap=1L, type='any', select='all') > 0'

This would provide a perspicacious idiom, thereby optimizing the API for Michaels observed common use case.

Just sayin'

~Malcolm


 .-----Original Message-----
 .From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Sean Davis
 .Sent: Friday, January 04, 2013 3:37 PM
 .To: Michael Lawrence
 .Cc: Tim Triche, Jr.; Vedran Franke; bioconductor at r-project.org
 .Subject: Re: [BioC] countMatches() (was: table for GenomicRanges)
 .
 .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
 .<lawrence.michael at gene.com> wrote:
 .> The change to the behavior of %in% is a pretty big one. Are you thinking
 .> that all set-based operations should behave this way? For example, setdiff
 .> and intersect? I really liked the syntax of "peaks %in% genes". In my
 .> experience, it's way more common to ask questions about overlap than about
 .> equality, so I'd rather optimize the API for that use case. But again,
 .> that's just my personal bias.
 .
 .For what it is worth, I share Michael's personal bias here.
 .
 .Sean
 .
 .
 .> Michael
 .>
 .>
 .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages at fhcrc.org> wrote:
 .>
 .>> Hi,
 .>>
 .>> I added findMatches() and countMatches() to the latest IRanges /
 .>> GenomicRanges packages (in BioC devel only).
 .>>
 .>>   findMatches(x, table): An enhanced version of ‘match’ that
 .>>           returns all the matches in a Hits object.
 .>>
 .>>   countMatches(x, table): Returns an integer vector of the length
 .>>           of ‘x’, containing the number of matches in ‘table’ for
 .>>           each element in ‘x’.
 .>>
 .>> countMatches() is what you can use to tally/count/tabulate (choose your
 .>> preferred term) the unique elements in a GRanges object:
 .>>
 .>>   library(GenomicRanges)
 .>>   set.seed(33)
 .>>   gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE), width=5))
 .>>
 .>> Then:
 .>>
 .>>   > gr_levels <- sort(unique(gr))
 .>>   > countMatches(gr_levels, gr)
 .>>    [1] 1 1 1 2 4 2 2 1 2 2 2
 .>>
 .>> Note that findMatches() and countMatches() also work on IRanges and
 .>> DNAStringSet objects, as well as on ordinary atomic vectors:
 .>>
 .>>   library(hgu95av2probe)
 .>>   library(Biostrings)
 .>>   probes <- DNAStringSet(hgu95av2probe)
 .>>   unique_probes <- unique(probes)
 .>>   count <- countMatches(unique_probes, probes)
 .>>   max(count)  # 7
 .>>
 .>> I made other changes in IRanges/GenomicRanges so that the notion
 .>> of "match" between elements of a vector-like object now consistently
 .>> means "equality" instead of "overlap", even for range-based objects
 .>> like IRanges or GRanges objects. This notion of "equality" is the
 .>> same that is used by ==. The most visible consequence of those
 .>> changes is that using %in% between 2 IRanges or GRanges objects
 .>> 'query' and 'subject' in order to do overlaps was replaced by
 .>> overlapsAny(query, subject).
 .>>
 .>>   overlapsAny(query, subject): Finds the ranges in ‘query’ that
 .>>      overlap any of the ranges in ‘subject’.
 .>>
 .>> There are warnings and deprecation messages in place to help smooth
 .>> the transition.
 .>>
 .>> Cheers,
 .>> H.
 .>>
 .>> --
 .>> Hervé Pagès
 .>>
 .>> Program in Computational Biology
 .>> Division of Public Health Sciences
 .>> Fred Hutchinson Cancer Research Center
 .>> 1100 Fairview Ave. N, M1-B514
 .>> P.O. Box 19024
 .>> Seattle, WA 98109-1024
 .>>
 .>> E-mail: hpages at fhcrc.org
 .>> Phone:  (206) 667-5791
 .>> Fax:    (206) 667-1319
 .>>
 .>
 .>         [[alternative HTML version deleted]]
 .>
 .>
 .> _______________________________________________
 .> Bioconductor mailing list
 .> Bioconductor at r-project.org
 .> https://stat.ethz.ch/mailman/listinfo/bioconductor
 .> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
 .
 ._______________________________________________
 .Bioconductor mailing list
 .Bioconductor at r-project.org
 .https://stat.ethz.ch/mailman/listinfo/bioconductor
 .Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


More information about the Bioconductor mailing list