[BioC] subset GRanges object via ElementMetadata

Sat Feb 23 01:27:40 CET 2013

On 02/22/2013 02:35 PM, Tim Triche, Jr. wrote:
> That's odd...  I added an NA and sure enough, it fails:
>
> R> test.gr[ test.gr$over == 2 ]
> Error in IRanges:::normalizeSingleBracketSubscript(i, x) :
>    subscript contains NAs
>
> But which() works fine:
>
> R> test.gr[ which(test.gr$over == 2) ]
> GRanges with 1 range and 3 metadata columns:
>        seqnames           ranges strand |  edensity     epeak      over
>           <Rle>        <IRanges>  <Rle> | <integer> <integer> <integer>
>    [1]     chr1 [762136, 763199]      * |      1000       771         2
>    ---
>
> I wonder if this is an easy fix, too?

In base R, subscripting with NA leads to

 > x = 1:5
 > x[NA]
[1] NA NA NA NA NA

which makes a weird sense (recycling a length 1 NA) but I/GRanges don't support 
the notion of NA-ranges. So not implemented by design and hence not fixable is 
probably the answer.

Martin

>
>
>
>
> On Fri, Feb 22, 2013 at 2:26 PM, Arnaud Amzallag
> <arnaud.amzallag at gmail.com>wrote:
>
>> test.gr[values(test.gr)$over %in% 2]
>>
>> works.
>>
>> test.gr[values(test.gr)$over == 2] works too if over does not contains
>> NAs.
>>
>> Arnaud
>>
>> On Feb 22, 2013, at 10:33 AM, Hermann Norpois wrote:
>>
>>> Hello,
>>>
>>> I am looking for a method to subset a GRangesObject by means of values
>> (or
>>> ElementMetadata column), for instance
>>> over==2.
>>>
>>> How does it work?
>>>
>>> Thanks
>>> Hermann
>>>
>>>
>>>> test.gr
>>> GRanges with 6 ranges and 3 metadata columns:
>>>       seqnames           ranges strand |  edensity     epeak      over
>>>          <Rle>        <IRanges>  <Rle> | <integer> <integer> <integer>
>>>   [1]     chr1 [713844, 714487]      * |      1000       256         1
>>>   [2]     chr1 [762136, 763199]      * |      1000       771         2
>>>   [3]     chr1 [780124, 780289]      * |       519        74         0
>>>   [4]     chr1 [780533, 780677]      * |       516        68         0
>>>   [5]     chr1 [781104, 781387]      * |       601       140         0
>>>   [6]     chr1 [793830, 794396]      * |       610       290         0
>>>   ---
>>>   seqlengths:
>>>     chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX
>>> chrY
>>>       NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA
>>> NA
>>>> dput (test.gr)
>>> new("GRanges"
>>>     , seqnames = new("Rle"
>>>     , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12",
>>> "chr13",
>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2",
>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7",
>>> "chr8", "chr9", "chrX", "chrY"), class = "factor")
>>>     , lengths = 6L
>>>     , elementMetadata = NULL
>>>     , metadata = list()
>>> )
>>>     , ranges = new("IRanges"
>>>     , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L)
>>>     , width = c(644L, 1064L, 166L, 145L, 284L, 567L)
>>>     , NAMES = NULL
>>>     , elementType = "integer"
>>>     , elementMetadata = NULL
>>>     , metadata = list()
>>> )
>>>     , strand = new("Rle"
>>>     , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>>>     , lengths = 6L
>>>     , elementMetadata = NULL
>>>     , metadata = list()
>>> )
>>>     , elementMetadata = new("DataFrame"
>>>     , rownames = NULL
>>>     , nrows = 6L
>>>     , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L,
>>> 601L, 610L
>>> ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L,
>>> 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over"))
>>>     , elementType = "ANY"
>>>     , elementMetadata = NULL
>>>     , metadata = list()
>>> )
>>>     , seqinfo = new("Seqinfo"
>>>     , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14",
>>> "chr15",
>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21",
>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
>>> "chrX", "chrY")
>>>     , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>> NA_integer_,
>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>>>     , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>> NA, NA,
>>> NA, NA, NA, NA, NA, NA, NA, NA, NA)
>>>     , genome = c(NA_character_, NA_character_, NA_character_,
>>> NA_character_,
>>> NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_,
>>> NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_,
>>> NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_,
>>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
>>> )
>>> )
>>>     , metadata = list()
>>> )
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793