[BioC] subset GRanges object via ElementMetadata

Hervé Pagès hpages at fhcrc.org
Mon Feb 25 09:00:22 CET 2013


Hi Michael,

On 02/23/2013 04:50 AM, Michael Lawrence wrote:
> Hi Hervé,
>
> That's what I ended up doing, actually. One question that came up though
> is whether we want to support 2D subsetting of all (or at least most)
> Vector objects, in the same manner as GRanges. I think it would work,
> how about you?

If by 2D subsetting you're referring to gr[i,j], I'm opposed to it.
I think it's a mistake to try to put the 2D *low-level* API on top of
objects that are conceptually not 2D objects. The current situation
where we have 2D subsetting already work on both GRanges and
GRangesList objects but do different things is messy and tells me
that we shouldn't have provided this in the first place.

Sounds like the gr$foo story again. Hopefully gr$foo will remain a 1
time exception.

I think subset() is already giving you something similar to the 2D
subsetting right?

H.

>
> Michael
>
>
> On Fri, Feb 22, 2013 at 5:33 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi Michael,
>
>
>     On 02/22/2013 12:56 PM, Michael Lawrence wrote:
>
>         Btw, I hacked together a subset() method for GenomicRanges
>         yesterday. It
>         respects the metadata columns. Someone could probably come up
>         with some
>         reason why that violates the conceptual foundations of
>         something, but I
>         find it useful.
>
>         So you could do:
>         subset(gr, over == 2)
>
>
>     Sounds good to me. Hopefully you set the method on Vector objects,
>     rather than just GenomicRanges objects.
>
>     Thanks,
>     H.
>
>
>
>         Will commit shortly.
>
>         Michael
>
>
>
>
>
>         On Fri, Feb 22, 2013 at 10:10 AM, Tim Triche, Jr.
>         <tim.triche at gmail.com <mailto:tim.triche at gmail.com>>wrote:
>
>             the shorthand method would be
>
>             GR[ GR$over == 2 ]
>
>             and in your example,
>
>             R> test.gr <http://test.gr>
>             GRanges with 6 ranges and 3 metadata columns:
>                     seqnames           ranges strand |  edensity
>             epeak      over
>                        <Rle>        <IRanges>  <Rle> | <integer>
>             <integer> <integer>
>                 [1]     chr1 [713844, 714487]      * |      1000
>             256         1
>                 [2]     chr1 [762136, 763199]      * |      1000
>             771         2
>                 [3]     chr1 [780124, 780289]      * |       519
>               74         0
>                 [4]     chr1 [780533, 780677]      * |       516
>               68         0
>                 [5]     chr1 [781104, 781387]      * |       601
>             140         0
>                 [6]     chr1 [793830, 794396]      * |       610
>             290         0
>                 ---
>                 seqlengths:
>                   chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7
>               chr8  chr9  chrX
>                chrY
>                     NA    NA    NA    NA    NA    NA ...    NA    NA
>               NA    NA    NA
>                NA
>             R> test.gr <http://test.gr>[ test.gr <http://test.gr>$over
>             == 2 ]
>             GRanges with 1 range and 3 metadata columns:
>                     seqnames           ranges strand |  edensity
>             epeak      over
>                        <Rle>        <IRanges>  <Rle> | <integer>
>             <integer> <integer>
>                 [1]     chr1 [762136, 763199]      * |      1000
>             771         2
>                 ---
>                 seqlengths:
>                   chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7
>               chr8  chr9  chrX
>                chrY
>                     NA    NA    NA    NA    NA    NA ...    NA    NA
>               NA    NA    NA
>                NA
>
>
>
>
>             On Fri, Feb 22, 2013 at 7:33 AM, Hermann Norpois
>             <hnorpois at gmail.com <mailto:hnorpois at gmail.com>>
>             wrote:
>
>                 Hello,
>
>                 I am looking for a method to subset a GRangesObject by
>                 means of values
>
>             (or
>
>                 ElementMetadata column), for instance
>                 over==2.
>
>                 How does it work?
>
>                 Thanks
>                 Hermann
>
>
>                     test.gr <http://test.gr>
>
>                 GRanges with 6 ranges and 3 metadata columns:
>                         seqnames           ranges strand |  edensity
>                 epeak      over
>                            <Rle>        <IRanges>  <Rle> | <integer>
>                 <integer> <integer>
>                     [1]     chr1 [713844, 714487]      * |      1000
>                    256         1
>                     [2]     chr1 [762136, 763199]      * |      1000
>                    771         2
>                     [3]     chr1 [780124, 780289]      * |       519
>                     74         0
>                     [4]     chr1 [780533, 780677]      * |       516
>                     68         0
>                     [5]     chr1 [781104, 781387]      * |       601
>                    140         0
>                     [6]     chr1 [793830, 794396]      * |       610
>                    290         0
>                     ---
>                     seqlengths:
>                       chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7
>                   chr8  chr9  chrX
>                 chrY
>                         NA    NA    NA    NA    NA    NA ...    NA    NA
>                     NA    NA    NA
>                 NA
>
>                     dput (test.gr <http://test.gr>)
>
>                 new("GRanges"
>                       , seqnames = new("Rle"
>                       , values = structure(1L, .Label = c("chr1",
>                 "chr10", "chr11",
>
>             "chr12",
>
>                 "chr13",
>                 "chr14", "chr15", "chr16", "chr17", "chr18", "chr19",
>                 "chr2",
>                 "chr20", "chr21", "chr22", "chr3", "chr4", "chr5",
>                 "chr6", "chr7",
>                 "chr8", "chr9", "chrX", "chrY"), class = "factor")
>                       , lengths = 6L
>                       , elementMetadata = NULL
>                       , metadata = list()
>                 )
>                       , ranges = new("IRanges"
>                       , start = c(713844L, 762136L, 780124L, 780533L,
>                 781104L, 793830L)
>                       , width = c(644L, 1064L, 166L, 145L, 284L, 567L)
>                       , NAMES = NULL
>                       , elementType = "integer"
>                       , elementMetadata = NULL
>                       , metadata = list()
>                 )
>                       , strand = new("Rle"
>                       , values = structure(3L, .Label = c("+", "-",
>                 "*"), class = "factor")
>                       , lengths = 6L
>                       , elementMetadata = NULL
>                       , metadata = list()
>                 )
>                       , elementMetadata = new("DataFrame"
>                       , rownames = NULL
>                       , nrows = 6L
>                       , listData = structure(list(edensity = c(1000L,
>                 1000L, 519L, 516L,
>                 601L, 610L
>                 ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L,
>                 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak",
>                 "over"))
>                       , elementType = "ANY"
>                       , elementMetadata = NULL
>                       , metadata = list()
>                 )
>                       , seqinfo = new("Seqinfo"
>                       , seqnames = c("chr1", "chr10", "chr11", "chr12",
>                 "chr13", "chr14",
>                 "chr15",
>                 "chr16", "chr17", "chr18", "chr19", "chr2", "chr20",
>                 "chr21",
>                 "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8",
>                 "chr9",
>                 "chrX", "chrY")
>                       , seqlengths = c(NA_integer_, NA_integer_,
>                 NA_integer_, NA_integer_,
>                 NA_integer_,
>                 NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>                 NA_integer_,
>                 NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>                 NA_integer_,
>                 NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>                 NA_integer_,
>                 NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>                       , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA,
>                 NA, NA, NA, NA, NA,
>                 NA, NA,
>                 NA, NA, NA, NA, NA, NA, NA, NA, NA)
>                       , genome = c(NA_character_, NA_character_,
>                 NA_character_,
>                 NA_character_,
>                 NA_character_, NA_character_, NA_character_, NA_character_,
>
>             NA_character_,
>
>                 NA_character_, NA_character_, NA_character_, NA_character_,
>
>             NA_character_,
>
>                 NA_character_, NA_character_, NA_character_, NA_character_,
>
>             NA_character_,
>
>                 NA_character_, NA_character_, NA_character_,
>                 NA_character_, NA_character_
>                 )
>                 )
>                       , metadata = list()
>                 )
>
>                           [[alternative HTML version deleted]]
>
>                 _________________________________________________
>                 Bioconductor mailing list
>                 Bioconductor at r-project.org
>                 <mailto:Bioconductor at r-project.org>
>                 https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                 <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                 Search the archives:
>                 http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>                 <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>             --
>             *A model is a lie that helps you see the truth.*
>             *
>             *
>             Howard Skipper<
>             http://cancerres.aacrjournals.__org/content/31/9/1173.full.pdf
>             <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>__>
>
>                       [[alternative HTML version deleted]]
>
>             _________________________________________________
>             Bioconductor mailing list
>             Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>             https://stat.ethz.ch/mailman/__listinfo/bioconductor
>             <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>             Search the archives:
>             http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>             <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>                  [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list