[BioC] countMatches() (was: table for GenomicRanges)

Hervé Pagès hpages at fhcrc.org
Fri Jan 4 22:11:08 CET 2013


Hi,

I added findMatches() and countMatches() to the latest IRanges / 
GenomicRanges packages (in BioC devel only).

   findMatches(x, table): An enhanced version of ‘match’ that
           returns all the matches in a Hits object.

   countMatches(x, table): Returns an integer vector of the length
           of ‘x’, containing the number of matches in ‘table’ for
           each element in ‘x’.

countMatches() is what you can use to tally/count/tabulate (choose your
preferred term) the unique elements in a GRanges object:

   library(GenomicRanges)
   set.seed(33)
   gr <- GRanges("chr1", IRanges(sample(15,20,replace=TRUE), width=5))

Then:

   > gr_levels <- sort(unique(gr))
   > countMatches(gr_levels, gr)
    [1] 1 1 1 2 4 2 2 1 2 2 2

Note that findMatches() and countMatches() also work on IRanges and
DNAStringSet objects, as well as on ordinary atomic vectors:

   library(hgu95av2probe)
   library(Biostrings)
   probes <- DNAStringSet(hgu95av2probe)
   unique_probes <- unique(probes)
   count <- countMatches(unique_probes, probes)
   max(count)  # 7

I made other changes in IRanges/GenomicRanges so that the notion
of "match" between elements of a vector-like object now consistently
means "equality" instead of "overlap", even for range-based objects
like IRanges or GRanges objects. This notion of "equality" is the
same that is used by ==. The most visible consequence of those
changes is that using %in% between 2 IRanges or GRanges objects
'query' and 'subject' in order to do overlaps was replaced by
overlapsAny(query, subject).

   overlapsAny(query, subject): Finds the ranges in ‘query’ that
      overlap any of the ranges in ‘subject’.

There are warnings and deprecation messages in place to help smooth
the transition.

Cheers,
H.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list