[BioC] Subscripting GenomicRanges objects with [[ or $
Tim Yates
TYates at picr.man.ac.uk
Fri Sep 3 10:04:37 CEST 2010
After sleeping on it overnight, I think I might go the .get.column route
It would be possible for me to do something like:
If( length( showMethods( '[[', classes='RangedData', inherited=F, showEmpty=F, printTo=F ) ) == 0 ) {
setMethod("[[", "GRanges",
function(x, i, j, ...) { ...code... }
}
But I worry that this would firstly pollute the GRanges namespace globally from an external location (which could result in bugs that are hard to track down, but will be blamed on the GRanges package), and secondly break if the '[[' method was defined for GRanges elsewhere with a different meaning than I am expecting in my package.
Cheers,
Tim
On 02/09/2010 20:08, "Michael Lawrence" <lawrence.michael at gene.com> wrote:
I agree that inconsistencies are undesirable, but there are already enough inconsistencies between GRangesList and GRanges that writing a method for their union would not be a trivial exercise. In this case, it would only be a short-cut that would need to be avoided. A warning to this effect in the documentation may be sufficient.
Michael
On Wed, Sep 1, 2010 at 9:27 AM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
I am not sure where the design will lead, but another aspect of GRanges is that it has an accompanying GRangesList class for housing information such as the constituent exons in a transcript. There is a benefit for developers and script writers to having a similar mechanism for extracting these metadata columns for both class types. For a GRangesList, the [[/$ operators pull out a GRanges object for the selected transcript. So even if [[ and $ methods were added for GRanges, there would still be an issue for GRangesList objects.
Cheers,
Patrick
Quoting Michael Lawrence <lawrence.michael at gene.com>:
On Wed, Sep 1, 2010 at 3:07 AM, Tim Yates <tyates at picr.man.ac.uk> wrote:
Hi again,
One of the really nice things about the RangedData object is that it could
be treated (in general) the same way you would treat a data.frame, so it
was
possible to write methods that handled both object types the same way.
This was one of the design goals. Unfortunately, RangedData has some strange
behavior due to its internal structure. For example, it is not possible to
reorder rows across spaces (chromosomes). Usually, this is not a big deal,
but it can bite you. GRanges takes a simpler, flatter approach, but it was
designed as a set of ranges with formal treatment of spaces, strands + extra
information, rather than as a data frame with formal treatment of spaces and
ranges (RangedData).
I have a method which currently accepts a data.frame or a RangedData object
which I want to extend to allowing GRanges objects as well
Without the [[ or $ subscript operators being implemented would I need to
have a switch based on the class of the parameter?
As the values(obj)[['field']] method only works for GRanges objects (for
RangedData, this method does not cause an error, it just returns NULL),
Yes, there is an unfortunate conflict here. values() for RangedData returns
the DataFrameList, so its names are the names of the chromosomes. I think
you're better off adding a [[ method for GRanges objects, rather than a
.get.column().
Michael
I
guess I would need to write something like this:
.get.column = function( obj, field ) {
if( class( obj ) == 'GRanges' ) {
values(obj)[[ field ]]
}
else {
obj[[ field ]]
}
}
Then, call
.get.column(obj,'name')
wherever I used to simply use
obj[['name']]
before introducing GenomicRanges?
Tim
On 27/08/2010 15:02, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
> On 08/27/2010 03:03 AM, Tim Yates wrote:
>> Hi Richard,
>>
>> Ahhh..cool, yeah that works. Shame it's not a unified interface across
all
>> three datatypes though.
>
> These were intentional design decisions to reduce ambiguities in which
> of the components of these complex arguments subscript operations were
> meant to apply, in the long run making it easier to write unambiguous
> and easy to read code. Martin
>
>>
>> Thanks for pointing me in the right direction though :-)
>>
>> Tim
>>
>> On 27/08/2010 10:31, "Richard Pearson" <richard.pearson at well.ox.ac.uk>
>> wrote:
>>
>>> Hi Tim
>>>
>>> I think you need the values accessor method here:
>>>
>>> print( values(my.gr <http://my.gr> )[[ 'name' ]] )
>>>
>>> Cheers
>>>
>>> Richard
>>>
>>>
>>> Tim Yates wrote:
>>>> Hi all,
>>>>
>>>> I'm trying to move to using GRanges objects for storing my genomic
features
>>>> rather than IRanges objects that I use currently.
>>>>
>>>> However, I cannot seem to subscript the Genomic Ranges object to
extract a
>>>> single column from the meta-data of the object.
>>>>
>>>> Hopefully this code explains what I am trying to do, and someone can
point
>>>> me in the right direction?
>>>>
>>>> Cheers,
>>>>
>>>> Tim
>>>>
>>>>> library(GenomicRanges)
>>>> Loading required package: IRanges
>>>>
>>>> Attaching package: 'IRanges'
>>>>
>>>>
>>>> The following object(s) are masked from package:base :
>>>>
>>>> cbind,
>>>> Map,
>>>> mapply,
>>>> order,
>>>> paste,
>>>> pmax,
>>>> pmax.int <http://pmax.int> ,
>>>> pmin,
>>>> pmin.int <http://pmin.int> ,
>>>> rbind,
>>>> rep.int <http://rep.int> ,
>>>> table
>>>>
>>>>> library(GenomicRanges)
>>>>> my.starts = c( 10, 100, 1000 )
>>>>> my.ends = c( 20, 200, 2000 )
>>>>> my.spaces = c( '1', '2', '3' )
>>>>> my.strands = c( '+', '+', '-' )
>>>>> my.names = c( 'seq1', 'seq2', 'seq3' )
>>>>> my.delta = c( 1.23, 2.34, 3.45 )
>>>>>
>>>>> my.df = data.frame( start=my.starts, end=my.ends, space=my.spaces,
>>>> strand=my.strands, name=my.names, delta=my.delta )
>>>>> my.rd = as( my.df, 'RangedData' )
>>>>> my.gr <http://my.gr> = as( my.rd, 'GRanges' )
>>>>>
>>>>
>>>> # Extract the name field from each of these objects using [[
>>>>
>>>>> print( my.df[[ 'name' ]] )
>>>> [1] seq1 seq2 seq3
>>>> Levels: seq1 seq2 seq3
>>>>> print( my.rd[[ 'name' ]] )
>>>> [1] seq1 seq2 seq3
>>>> Levels: seq1 seq2 seq3
>>>>> print( my.gr <http://my.gr> [[ 'name' ]] )
>>>> Error in my.gr <http://my.gr> [["name"]] : missing '[[' method for Sequence class
GRanges
>>>>
>>>> # Extract the name field from each of these objects using $
>>>>
>>>>> print( my.df$'name' )
>>>> [1] seq1 seq2 seq3
>>>> Levels: seq1 seq2 seq3
>>>>> print( my.rd$'name' )
>>>> [1] seq1 seq2 seq3
>>>> Levels: seq1 seq2 seq3
>>>>> print( my.gr <http://my.gr> $'name' )
>>>> Error in x[[name, exact = FALSE]] :
>>>> missing '[[' method for Sequence class GRanges
>>>>> sessionInfo()
>>>> R version 2.10.1 (2009-12-14)
>>>> x86_64-apple-darwin9.8.0
>>>>
>>>> locale:
>>>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] GenomicRanges_1.0.8 IRanges_1.6.15
>>>> --------------------------------------------------------
>>>> This email is confidential and intended solely for the
u...{{dropped:15}}
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>> --------------------------------------------------------
>> This email is confidential and intended solely for the
u...{{dropped:12}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--------------------------------------------------------
This email is confidential and intended solely for the...{{dropped:13}}
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list