[BioC] Selecting elements in GRanges object by element metadata

Michael Muratet mmuratet at hudsonalpha.org
Wed Jul 11 17:26:28 CEST 2012


On Jul 11, 2012, at 10:21 AM, Kasper Daniel Hansen wrote:

> What you are reporting is true for any (well, there may be exceptions
> I guess) subsetting.  Try for example with a standard matrix.  The
> solution is to add which().  Contrast
>
>> x = c(1,2,NA)
>> x == 2
> [1] FALSE  TRUE    NA
>> which(x == 2)
> [1] 2
>
> Kasper
>

Thanks, I should have tried that before. This syntax works:

 > tss.annot.gr[na.omit(which(elementMetadata(tss.annot.gr) 
$GENE>0)),"GENE"]
GRanges with 3446 ranges and 1 elementMetadata col:
          seqnames                 ranges strand   |  GENE
             <Rle>              <IRanges>  <Rle>   |   <numeric>
      [1]     chr1   [ 4773791,  4776291]      -   | 1.063973966
      [2]     chr1   [ 5007460,  5009960]      -   | 1.668134677
      [3]     chr1   [16092486, 16094986]      -   | 1.748685661
      [4]     chr1   [36737931, 36740431]      -   | 1.465666717
      [5]     chr1   [38052053, 38054553]      -   | 1.750940655
      [6]     chr1   [38054354, 38056854]      +   | 1.677518675
      [7]     chr1   [39592146, 39594646]      +   | 0.696900841
      [8]     chr1   [40380974, 40383474]      +   | 0.777552281
      [9]     chr1   [40738056, 40740556]      +   | 0.511665769

Mike

> On Wed, Jul 11, 2012 at 11:09 AM, Michael Muratet
> <mmuratet at hudsonalpha.org> wrote:
>> Greetings
>>
>> I would like to select elements from a GRanges object by testing  
>> values in
>> the metadata columns. This seems to work OK:
>>
>> x.gr[which(elementMetadata(x.gr)$fdr<0.05)]
>>
>> So does this, although there's nothing in the documentation about  
>> the []
>> operator accepting logical values:
>>
>> fosl2.th17.gr[elementMetadata(fosl2.th17.gr)$fdr<0.05]
>>
>> The problem arises when I try to select from a GRanges object where  
>> the
>> metadata columns have NAs:
>>
>>> tss.annot.gr[na.omit(elementMetadata(tss.annot.gr)$GENE>0),"GENE"]
>> GRanges with 4028 ranges and 1 elementMetadata col:
>>         seqnames                 ranges strand   |   GENE
>>            <Rle>              <IRanges>  <Rle>   |    <numeric>
>>     [1]     chr1   [ 3659579,  3662079]      -   |         <NA>
>>     [2]     chr1   [ 4847394,  4849894]      +   |            0
>>     [3]     chr1   [10025979, 10028479]      -   |         <NA>
>>     [4]     chr1   [17085879, 17088379]      -   |         <NA>
>>     [5]     chr1   [21067298, 21069798]      -   |         <NA>
>>     [6]     chr1   [21949662, 21952162]      -   |            0
>>     [7]     chr1   [23388014, 23390514]      -   |         <NA>
>>     [8]     chr1   [23768264, 23770764]      +   |         <NA>
>>     [9]     chr1   [23927128, 23929628]      -   |         <NA>
>>     ...      ...                    ...    ... ...          ...
>>  [4020]     chr2 [126607180, 126609680]      -   |            0
>>  [4021]     chr2 [127345106, 127347606]      -   |            0
>>  [4022]     chr2 [129195132, 129197632]      +   | -1.223140339
>>  [4023]     chr2 [129194856, 129197356]      -   | -1.628782357
>>  [4024]     chr2 [129360338, 129362838]      -   | -1.475535653
>>  [4025]     chr2 [129837609, 129840109]      +   |            0
>>  [4026]     chr2 [129948520, 129951020]      +   |            0
>>  [4027]     chr2 [140213446, 140215946]      -   |            0
>>  [4028]     chr2 [148267271, 148269771]      -   | -1.564551101
>>
>> The values returned violate the condition. It won't work at all  
>> without
>> na.omit.
>>
>> I can coerce the GRanges object to a data.frame, do the selection  
>> and create
>> a new GRanges object, but I'm hoping there is a way to do it  
>> directly.
>>
>> Am I using the syntax correctly? Is there something peculiar about a
>> DataFrame vs a data.frame that's getting in the way?
>>
>> Thanks
>>
>> Mike
>>
>>
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806



More information about the Bioconductor mailing list