[BioC] interoperability between IRanges/ShortRead and rtracklayer

Patrick Aboyoun paboyoun at fhcrc.org
Fri Jun 5 01:30:25 CEST 2009


Simon,
Below is code that achieves what I think you are after. I'll have to ask 
Michael if there is a cleaner way, and if there isn't we should create one.

Also I know the classes can be a bit bewildering and here is some 
information that may help you out. Both a RangedData object and a 
GenomeData object represent a list across spaces that (may or may not in 
the case of RangedData) represent chromosomes. A RangedData object is 
more rigid and requires you to have ranges within each of the spaces 
with possibly an associated rectangular data set. A GenomeData object is 
essentially a list where the elements represents unstructured 
information on the chromosomes (i.e. the elements can contain whatever 
you want them too). They relate to Rle objects in the following manner:  
an Rle object can produce a RangedData object of length 1 and a 
GenomeData object of length k containing Rle objects as elements can 
produce a RangedData object of length k.

I hoped that helped.


Patrick



#####################################################

suppressMessages(library(IRanges))
suppressMessages(library(rtracklayer))
suppressMessages(library(BSgenome))

myGenomeData <- GenomeData(list(chr10 = Rle(1:1000, 1:1000), chr11 = 
Rle(1:100, 100:1)))
myRangedData <- do.call(c, lapply(as.list(myGenomeData), as, "RangedData"))
export(myRangedData, "mytrack.wig", format = "wig")

 > myRangedData
RangedData: 1100 ranges by 1 columns
columns(1): score
sequences(2): chr10 chr11

 > sessionInfo()
R version 2.9.0 (2009-04-17)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] BSgenome_1.12.2   Biostrings_2.12.5 rtracklayer_1.4.0 RCurl_0.97-3    
[5] bitops_1.0-4.1    IRanges_1.2.2   

loaded via a namespace (and not attached):
[1] Biobase_2.4.1 XML_2.5-0   


Simon Anders wrote:
> Dear Patrick, Hervé, Michael or Martin
>
> I'm unclear on how to get from an IRanges Rle object to a RangedData 
> object to use with rtracklayer's export function.
>
> The use case is as follows:
>
> Let's say I caclulate a coverage vector from some ChIP-Seq data, e.g.:
>
>    library(IRanges)
>    library(ShortRead)
>    library(rtracklayer)
>    library(BSgenome.Mmusculus.UCSC.mm9)
>
>    # Load the example data from the ChIP-Seq tutorial
>    load("ctcf.rda")
>
>    # Calculate the coverage
>    cvg = coverage( ctcf, width = seqlengths( Mmusculus ) )
>
> Then, I get a GenomeData object that contains Rle objects:
>
>    > cvg
>    A GenomeData instance
>    chromosomes(3): chr10 chr11 chr12
>    > cvg$chr10
>      'integer' Rle instance of length 129987043 with 308238 runs
>      Lengths:  24 968 5 3 11 5 3 2 2 2 ...
>      Values :  1 0 1 2 3 4 3 4 3 4 ...
>
> Given that coverage data is a very typical case of something that one 
> would like to display as a track in a genome browser, it would be nice 
> if the rtracklayer object could deal with it. For example I might want 
> to export the 'cvg' object as a wiggle file with rtracklayer's 
> 'export' function or add it to a browser session with the 'track<-' 
> function.
>
> However, 'export' cannot deal with it
>
> > export( cvg, "wig" )
> Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "export.ucsc", for 
> signature "GenomeData"
> > export( cvg$chr10, "wig" )
> Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "export.ucsc", for 
> signature "Rle"
>
> It seems that it expects a RangedData object, or, alternatively, a 
> RangedDataList.
>
> It seems to me that there should be a one-to-one correspondance from 
> Rle to RangedData and from GenomeData to RangedDataList. How can I 
> convert from one to the other? And why are there both kinds of classes?
>
> Thanks in advance.
>
>   Simon
>
>
> +---
> | Dr. Simon Anders, Dipl. Phys.
> | European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
> | office phone +44-1223-492680, mobile phone +44-7505-841692
> | preferred (permanent) e-mail: sanders at fs.tum.de



More information about the Bioconductor mailing list