[BioC] table for GenomicRanges

Hervé Pagès hpages at fhcrc.org
Thu Dec 6 04:06:25 CET 2012


Hi Tim,

On 12/05/2012 04:55 PM, Tim Triche, Jr. wrote:
> that's cool, then it's consistent with score() and friends... this
> sounds like a grand scheme

It's not clear to me if you are voting for tally() + mcols(.)$tally
or for count(.) + mcols()$count?

Note that this functionality doesn't have to be restricted to
GenomicRanges and can be extended to any Vector object 'x' for
which unique(), sort() and match() work and "do the right thing".
Then the implementation is simply:

     ans <- sort(unique(x))
     names(ans) <- mcols(ans) <- NULL
     y <- match(x, ans)
     mcols(ans)$count <- tabulate(y, nbins=length(ans))
     ans

Unfortunately right now match() on GenomicRanges and Ranges objects
reports a match in case of *overlap* instead of *equality* (this is
why I needed to use IRanges:::matchIntegerQuads in the implementation
of tableGenomicRanges2() I showed previously). Just a heads-up that
I'd like to change this but with a transition plan e.g. with an extra
argument to the match() generic for letting the user control what
"match" means (default would be "overlaps" for now but should probably
be changed to "equality" in the future).

Cheers,
H.

>
>
>
> On Wed, Dec 5, 2012 at 4:31 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     So with tally, what would be the name of the metadata col? Would "tally"
>     be OK?
>
>     It's kind of neat to use the same name for the metadata col as for the
>     function itself. That makes the code more readable:
>
>        mcols(count(gr))$count
>
>     Thanks for the feedback,
>     H.
>
>
>
>     On 12/05/2012 03:05 PM, Tim Triche, Jr. wrote:
>
>         that goes along nicely with BamTally in gmapR, which is a damn
>         useful
>         function IMHO... actually I think I wrote a tally() function or
>         something like it for Rsubread, can't remember whether I sent it
>         in with
>         the patch though.  Anyways, a running tally is a good mental
>         image for
>         this functionality
>
>
>
>
>         On Wed, Dec 5, 2012 at 2:59 PM, Steve Lianoglou
>         <mailinglist.honeypot at gmail.__com
>         <mailto:mailinglist.honeypot at gmail.com>
>         <mailto:mailinglist.honeypot at __gmail.com
>         <mailto:mailinglist.honeypot at gmail.com>>>
>
>         wrote:
>
>              Hi,
>
>              On Wed, Dec 5, 2012 at 5:50 PM, Michael Lawrence
>              <lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>
>         <mailto:lawrence.michael at gene.__com
>         <mailto:lawrence.michael at gene.com>>> wrote:
>               > The question is whether there is ever a use case to have
>         a simple
>              table.
>               > This is analogous to base R's table and data.frame. For
>         example,
>              if you
>               > call xtabs(), you get a table, then you have to call
>              as.data.frame to get
>               > back into the data.frame context. This is sort of clean,
>         and we could
>               > create an extension of table that for efficiency stores the
>              associated
>               > GRanges along with the counts in the .Data slot. Then as(x,
>              "GRanges") on
>               > that would generate the GRanges with the count column.
>         That would be
>               > complicated though.
>               >
>               > Another issue is that table() cannot be used in the
>         general way,
>              due to
>               > restrictions on dispatch with "...".
>               >
>               > So I think I'm in favor of a new "count" generic. That
>         naming is
>              consistent
>               > with countOverlaps, countSubjects, countQueries, etc.
>
>              Or maybe `tally`? Somehow I have a mental association w/
>         that being
>              closer to `tabulate`, but I guess it's really not and maybe
>         it's just
>              me that has a mind map that puts `tally` closer to `table` than
>              `count` is .
>
>              --
>              Steve Lianoglou
>              Graduate Student: Computational Systems Biology
>                | Memorial Sloan-Kettering Cancer Center
>                | Weill Medical College of Cornell University
>              Contact Info: http://cbio.mskcc.org/~lianos/__contact
>         <http://cbio.mskcc.org/~lianos/contact>
>
>
>
>
>         --
>         /A model is a lie that helps you see the truth./
>         /
>         /
>         Howard Skipper
>         <http://cancerres.__aacrjournals.org/content/31/9/__1173.full.pdf <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>
>
> --
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper
> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list