[BioC] Subscripting GenomicRanges objects with [[ or $

Fri Sep 3 10:04:37 CEST 2010

After sleeping on it overnight, I think I might go the .get.column route

It would be possible for me to do something like:

If( length( showMethods( '[[', classes='RangedData', inherited=F, showEmpty=F, printTo=F ) ) == 0 ) {
    setMethod("[[", "GRanges",
      function(x, i, j, ...) { ...code... }
}

But I worry that this would firstly pollute the GRanges namespace globally from an external location (which could result in bugs that are hard to track down, but will be blamed on the GRanges package), and secondly break if the '[[' method was defined for GRanges elsewhere with a different meaning than I am expecting in my package.

Cheers,

Tim

On 02/09/2010 20:08, "Michael Lawrence" <lawrence.michael at gene.com> wrote:

	I agree that inconsistencies are undesirable, but there are already enough inconsistencies between GRangesList and GRanges that writing a method for their union would not be a trivial exercise. In this case, it would only be a short-cut that would need to be avoided. A warning to this effect in the documentation may be sufficient.

	Michael

	On Wed, Sep 1, 2010 at 9:27 AM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:

		I am not sure where the design will lead, but another aspect of GRanges is that it has an accompanying GRangesList class for housing information such as the constituent exons in a transcript. There is a benefit for developers and script writers to having a similar mechanism for extracting these metadata columns for both class types. For a GRangesList, the [[/$ operators pull out a GRanges object for the selected transcript. So even if [[ and $ methods were added for GRanges, there would still be an issue for GRangesList objects.

		Cheers,
		Patrick

		Quoting Michael Lawrence <lawrence.michael at gene.com>:

			On Wed, Sep 1, 2010 at 3:07 AM, Tim Yates <tyates at picr.man.ac.uk> wrote:

				Hi again,

				One of the really nice things about the RangedData object is that it could
				be treated (in general) the same way you would treat a data.frame, so it
				was
				possible to write methods that handled both object types the same way.

			This was one of the design goals. Unfortunately, RangedData has some strange
			behavior due to its internal structure. For example, it is not possible to
			reorder rows across spaces (chromosomes). Usually, this is not a big deal,
			but it can bite you. GRanges takes a simpler, flatter approach, but it was
			designed as a set of ranges with formal treatment of spaces, strands + extra
			information, rather than as a data frame with formal treatment of spaces and
			ranges (RangedData).

			I have a method which currently accepts a data.frame or a RangedData object

				which I want to extend to allowing GRanges objects as well

				Without the [[ or $ subscript operators being implemented would I need to
				have a switch based on the class of the parameter?

				As the values(obj)[['field']] method only works for GRanges objects (for
				RangedData, this method does not cause an error, it just returns NULL),

			Yes, there is an unfortunate conflict here. values() for RangedData returns
			the DataFrameList, so its names are the names of the chromosomes. I think
			you're better off adding a [[ method for GRanges objects, rather than a
			.get.column().

			Michael

				I
				guess I would need to write something like this:

				.get.column = function( obj, field ) {
				     if( class( obj ) == 'GRanges' ) {
				         values(obj)[[ field ]]
				     }
				     else {
				         obj[[ field ]]
				     }
				   }

				Then, call

				 .get.column(obj,'name')

				wherever I used to simply use

				 obj[['name']]

				before introducing GenomicRanges?

				Tim

				On 27/08/2010 15:02, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:

				> On 08/27/2010 03:03 AM, Tim Yates wrote:
				>> Hi Richard,
				>>
				>> Ahhh..cool, yeah that works. Shame it's not a unified interface across
				all
				>> three datatypes though.
				>
				> These were intentional design decisions to reduce ambiguities in which
				> of the components of these complex arguments subscript operations were
				> meant to apply, in the long run making it easier to write unambiguous
				> and easy to read code. Martin
				>
				>>
				>> Thanks for pointing me in the right direction though :-)
				>>
				>> Tim
				>>
				>> On 27/08/2010 10:31, "Richard Pearson" <richard.pearson at well.ox.ac.uk>
				>> wrote:
				>>
				>>> Hi Tim
				>>>
				>>> I think you need the values accessor method here:
				>>>
				>>> print( values(my.gr <http://my.gr> )[[ 'name' ]] )
				>>>
				>>> Cheers
				>>>
				>>> Richard
				>>>
				>>>
				>>> Tim Yates wrote:
				>>>> Hi all,
				>>>>
				>>>> I'm trying to move to using GRanges objects for storing my genomic
				features
				>>>> rather than IRanges objects that I use currently.
				>>>>
				>>>> However, I cannot seem to subscript the Genomic Ranges object to
				extract a
				>>>> single column from the meta-data of the object.
				>>>>
				>>>> Hopefully this code explains what I am trying to do, and someone can
				point
				>>>> me in the right direction?
				>>>>
				>>>> Cheers,
				>>>>
				>>>> Tim
				>>>>
				>>>>> library(GenomicRanges)
				>>>> Loading required package: IRanges
				>>>>
				>>>> Attaching package: 'IRanges'
				>>>>
				>>>>
				>>>>     The following object(s) are masked from package:base :
				>>>>
				>>>>      cbind,
				>>>>      Map,
				>>>>      mapply,
				>>>>      order,
				>>>>      paste,
				>>>>      pmax,
				>>>>      pmax.int <http://pmax.int> ,
				>>>>      pmin,
				>>>>      pmin.int <http://pmin.int> ,
				>>>>      rbind,
				>>>>      rep.int <http://rep.int> ,
				>>>>      table
				>>>>
				>>>>> library(GenomicRanges)
				>>>>> my.starts  = c(     10,    100,   1000 )
				>>>>> my.ends    = c(     20,    200,   2000 )
				>>>>> my.spaces  = c(    '1',    '2',    '3' )
				>>>>> my.strands = c(    '+',    '+',    '-' )
				>>>>> my.names   = c( 'seq1', 'seq2', 'seq3' )
				>>>>> my.delta   = c(   1.23,   2.34,   3.45 )
				>>>>>
				>>>>> my.df = data.frame( start=my.starts, end=my.ends, space=my.spaces,
				>>>> strand=my.strands, name=my.names, delta=my.delta )
				>>>>> my.rd = as( my.df, 'RangedData' )
				>>>>> my.gr <http://my.gr>  = as( my.rd, 'GRanges' )
				>>>>>
				>>>>
				>>>> # Extract the name field from each of these objects using [[
				>>>>
				>>>>> print( my.df[[ 'name' ]] )
				>>>> [1] seq1 seq2 seq3
				>>>> Levels: seq1 seq2 seq3
				>>>>> print( my.rd[[ 'name' ]] )
				>>>> [1] seq1 seq2 seq3
				>>>> Levels: seq1 seq2 seq3
				>>>>> print( my.gr <http://my.gr> [[ 'name' ]] )
				>>>> Error in my.gr <http://my.gr> [["name"]] : missing '[[' method for Sequence class
				GRanges
				>>>>
				>>>> # Extract the name field from each of these objects using $
				>>>>
				>>>>> print( my.df$'name' )
				>>>> [1] seq1 seq2 seq3
				>>>> Levels: seq1 seq2 seq3
				>>>>> print( my.rd$'name' )
				>>>> [1] seq1 seq2 seq3
				>>>> Levels: seq1 seq2 seq3
				>>>>> print( my.gr <http://my.gr> $'name' )
				>>>> Error in x[[name, exact = FALSE]] :
				>>>>   missing '[[' method for Sequence class GRanges
				>>>>> sessionInfo()
				>>>> R version 2.10.1 (2009-12-14)
				>>>> x86_64-apple-darwin9.8.0
				>>>>
				>>>> locale:
				>>>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
				>>>>
				>>>> attached base packages:
				>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
				>>>>
				>>>> other attached packages:
				>>>> [1] GenomicRanges_1.0.8 IRanges_1.6.15
				>>>> --------------------------------------------------------
				>>>> This email is confidential and intended solely for the
				u...{{dropped:15}}
				>>>>
				>>>> _______________________________________________
				>>>> Bioconductor mailing list
				>>>> Bioconductor at stat.math.ethz.ch
				>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
				>>>> Search the archives:
				>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
				>>>>
				>> --------------------------------------------------------
				>> This email is confidential and intended solely for the
				u...{{dropped:12}}
				>>
				>> _______________________________________________
				>> Bioconductor mailing list
				>> Bioconductor at stat.math.ethz.ch
				>> https://stat.ethz.ch/mailman/listinfo/bioconductor
				>> Search the archives:
				>> http://news.gmane.org/gmane.science.biology.informatics.conductor
				>
				--------------------------------------------------------
				This email is confidential and intended solely for the...{{dropped:13}}

			_______________________________________________
			Bioconductor mailing list
			Bioconductor at stat.math.ethz.ch
			https://stat.ethz.ch/mailman/listinfo/bioconductor
			Search the archives:  http://news.gmane.org/gmane.science.biology.informatics.conductor