[BioC] views on Rle using GRanges object

Hervé Pagès hpages at fhcrc.org
Tue Aug 16 18:15:11 CEST 2011


Hi Janet,

On 11-08-15 05:25 PM, Janet Young wrote:
> Hi again,
>
> I have another question, in the "I think I need a convoluted workaround, but maybe I'm missing a simple solution" genre (seems like most of my questions are like that).
>
> I have an RleList object representing mapability for the whole human genome. I'd like to:
> (a) calculate viewMeans for various regions of interest that I've been representing as GRanges, and
> (b) I'd like the underlying code to be smart and match chromosome names up in the RleList and the GRanges object (not rely on chromosomes being ordered the same in the two objects), and
> (c) I'd like to return the viewMeans results in the same order as the objects in my original GRanges
>
> I don't think this is currently implemented without doing several coercions (that introduce their own complications) but I'm not sure.  Some code is below that shows what I'm trying to do.   It seems like it might be a common kind of way to use viewMeans - I imagine people are gradually switching to use GRanges more and more? but really I don't know.
>
> My real world question is that I've read in mapability from a UCSC bigWig file, and made that into an RleList.  I have a bunch of other regions I've read in (again from UCSC) using rtracklayer as GRanges, and have annotated those with various things I'm interested in (e.g. number of RNAseq reads in the region).  I want to add the average mapability for each region, so that I can later look at how mapability affects those other things I'm annotating.
>
> Should I be being more strict with myself about how my GRanges are ordered and making sure they all have the same seqlevels and seqlengths?  Perhaps that would help the coercion workarounds go more smoothly.
>
> thanks,
>
> Janet
>
>
> library(GenomicRanges)
>
> ### make an RleList. Just for this example, I starting with a GRanges object and used coverage to get an RleList example. To get my real data I read in a UCSC bigWig file.
>
> fakeRegions<- GRanges(seqnames=c("chrA","chrA", "chrB", "chrC"), ranges=IRanges(start=c(1,51,1,1), end=c(60,90,20,10) ) )
> seqlengths(fakeRegions)<- c(100,100,100)
> myRleList<- coverage(fakeRegions)
> rm(fakeRegions)
>
> myRleList
>
> # SimpleRleList of length 3
> # $chrA
> # 'integer' Rle of length 100 with 4 runs
> #   Lengths: 50 10 30 10
> #   Values :  1  2  1  0
> #
> # $chrB
> # 'integer' Rle of length 100 with 2 runs
> #   Lengths: 20 80
> #   Values :  1  0
> #
> # $chrC
> # 'integer' Rle of length 100 with 2 runs
> #   Lengths: 10 90
> #   Values :  1  0
>
> ### make some regions of interest
> myRegions<- GRanges(seqnames=c("chrB", "chrC", "chrB"), ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), geneNames=c("g1","g2","g3") )
> myRegions
> # GRanges with 3 ranges and 1 elementMetadata value
> #     seqnames    ranges strand |   geneNames
> #<Rle>  <IRanges>   <Rle>  |<character>
> # [1]     chrB   [1, 20]      * |          g1
> # [2]     chrC   [1, 20]      * |          g2
> # [3]     chrB   [5, 10]      * |          g3
> #
> # seqlengths
> #  chrB chrC
> #    NA   NA
>
> ## can't use GRanges directly
> Views( myRleList, myRegions)
> # Error in RleViewsList(rleList = subject, rangesList = start) :
> #   'rangesList' must be a RangesList object
>
> ## can't use a simple coercion
> Views( myRleList, as(myRegions,"RangesList") )
> # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = SIMPLIFY,  :
> #   all object lengths must be multiple of longest object length
>
> ### although I can use that coercion if I first give the GRanges object the same levels as the RleList to force the lists to have the same names as each other:
> seqlevels(myRegions)<- names(myRleList)
> viewMeans(Views( myRleList, as(myRegions,"RangesList") ) )
> # SimpleNumericList of length 3
> # [["chrA"]] numeric(0)
> # [["chrB"]] 1 1
> # [["chrC"]] 0.5
>
> ### getting close to a solution - but I'd have liked to have been able to unlist this and directly add to my GRanges object e.g.
> values(myRegions)$meanCov<- unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), use.names=FALSE)
> ### but the regions are in a different order to how I started, so the above command would associate the wrong scores with the wrong regions.

Yep, 'myRegions' would first need to be sorted by seqlevels for the
above to work properly:

   > oo <- order(as.factor(seqnames(myRegions)))
   > myRegions <- myRegions[oo]
   > myRegions
   GRanges with 3 ranges and 1 elementMetadata value
       seqnames    ranges strand |   geneNames
          <Rle> <IRanges>  <Rle> | <character>
   [1]     chrB   [1, 20]      * |          g1
   [2]     chrB   [5, 10]      * |          g3
   [3]     chrC   [1, 20]      * |          g2

   seqlengths
    chrA chrB chrC
      NA   NA   NA

Cheers,
H.

>
> sessionInfo()
>
> R version 2.13.1 (2011-07-08)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicRanges_1.4.6 IRanges_1.10.5
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list